Download Subtitles for Harvard CS50 2026 Computer Science Course

Harvard CS50 (2026) – Full Computer Science University Course

freeCodeCamp.org

40562 segments EN

SRT - Most compatible format for video players (VLC, media players, video editors)

VTT - Web Video Text Tracks for HTML5 video and browsers

TXT - Plain text with timestamps for easy reading and editing

Subtitle Preview

Scroll to view all subtitles

[00:00]

If you want to learn about computer

[00:01]

science and the art of programming, this

[00:03]

course is where to start. CS50 is

[00:06]

considered by many to be one of the best

[00:08]

computer science courses in the world.

[00:11]

This is a Harvard University course

[00:13]

taught by Dr. David Men and we are proud

[00:16]

to bring it to the free code camp

[00:18]

channel. Throughout a series of

[00:19]

lectures, Dr. Men will teach you how to

[00:21]

think algorithmically and solve problems

[00:24]

efficiently. And make sure to check the

[00:27]

description for a lot of extra resources

[00:29]

that go along with the course.

[00:54]

>> [music]

[00:59]

[music]

[01:14]

>> All right. This is

[01:17]

[applause]

[01:20]

This is CS50, Harvard University's

[01:23]

introduction to the intellectual

[01:25]

enterprises of computer science and the

[01:26]

arts of programming. My name is David

[01:28]

Men and this is week zero. And by the

[01:30]

end of today, you'll know not only what

[01:32]

these light bulbs here spell, but so

[01:34]

much more. But why don't we start first

[01:35]

with the uh the elephant or the elephant

[01:38]

in the room. That is artificial

[01:39]

intelligence, which is seemingly

[01:41]

everywhere over the past few years. And

[01:42]

it's been said that it's going to change

[01:44]

programming. And that's absolutely the

[01:46]

case. It's been that way actually for

[01:47]

the past several years is only going to

[01:49]

get to be the case all the more. But

[01:51]

this is an incredibly exciting time.

[01:53]

This is actually a good thing I do think

[01:55]

in so far as now using AI in any number

[01:57]

of forms. You can ask the computer to

[01:59]

help solve some problem for you. You can

[02:01]

find some bug or mistake in your code.

[02:03]

Better still increasingly you can tell

[02:05]

the AI what additional features you want

[02:07]

to add to your software. And this is

[02:09]

huge because even in industry for years,

[02:11]

humans have been programming in some

[02:12]

form for decades, building products and

[02:14]

solutions to problems, the reality is

[02:16]

that you and I as humans have long been

[02:19]

the bottleneck. There's only so many

[02:20]

hours in the day. There's only so many

[02:22]

people on your team or in your company

[02:24]

and there's so many more bugs that you

[02:26]

want to solve and so many more features

[02:28]

that you want to implement. But at the

[02:30]

same time, you still really need to

[02:32]

understand the fundamentals. And indeed,

[02:34]

a class like this CS50 has never been

[02:36]

about teaching you how to program. Like

[02:38]

that's actually one of the side effects

[02:39]

of taking a class like this. But the

[02:41]

overarching goal is to teach you how to

[02:42]

think, how to take input and produce

[02:44]

correct output and how to master these

[02:46]

and other tools. And so by the end of

[02:48]

the semester, not only you will be not

[02:50]

only will you be acquainted with

[02:52]

languages like Scratch, which we'll

[02:53]

touch on today if you've not seen it

[02:54]

already, languages like C and Python and

[02:57]

SQL, HTML, CSS, and JavaScript. You'll

[03:00]

be able to teach yourself new things

[03:02]

ultimately, and ultimately be able to

[03:04]

tell computers increasingly what it is

[03:06]

you want it to do. But you'll still be

[03:08]

in the driver's seat, so to speak.

[03:09]

You'll be the pilot. You'll be the

[03:11]

conductor. Whatever your preferred

[03:12]

metaphor is. And that's what I think is

[03:14]

so empowering still about learning

[03:16]

introductory material, foundational

[03:17]

material, because you'll know what

[03:19]

you're ultimately talking about and what

[03:20]

you can in fact solve. And we've been

[03:22]

through this before, like when

[03:23]

calculators came out. It's still

[03:25]

valuable, I dare say, all these years

[03:26]

later to still know how to do addition

[03:28]

and subtraction and whatnot. And yet, I

[03:30]

think back on some of my own math

[03:31]

classes. I remember learning so many

[03:33]

darn ways in college how to take

[03:35]

derivatives and integrals. And after

[03:37]

like the six process of that, I sort of

[03:39]

realized, okay, I get it. I get the

[03:40]

idea. Do I really need to know this many

[03:42]

ways? And here too, with AI and with

[03:44]

code, can you increasingly sort of

[03:46]

master the ideas and then lean on a a

[03:49]

co-pilot assistant to actually help you

[03:51]

solve those same problems. So, let's do

[03:53]

some of this ourselves here. In fact,

[03:55]

just to give you a teaser of what you'll

[03:56]

be able to do yourselves before long,

[03:59]

let me go ahead and open up a little

[04:00]

something called Visual Studio Code, aka

[04:03]

VS Code for short. This is popular

[04:05]

largely open- source or free software

[04:07]

that's used by real world people in

[04:09]

industry to write code. And it's

[04:10]

essentially a text editor similar to

[04:12]

Notepad if you're familiar with that or

[04:14]

text edit kind of like Google Docs but

[04:16]

no boldf facing and underlining and and

[04:18]

things like that that you'd find in word

[04:19]

processing programs. And this is CS50's

[04:21]

version thereof. We're going to

[04:22]

introduce you to this all the more next

[04:24]

week. But for now, let's just give you a

[04:26]

taste of what you can do with an

[04:28]

environment like this. So I'm going to

[04:29]

switch over to this program already

[04:31]

running VS Code. And in this uh bottom

[04:35]

of the screen, you're going to see a

[04:36]

so-called terminal window. Again, more

[04:37]

on that next week. But it's in this

[04:38]

terminal window that I can write

[04:40]

commands that tells the computer what I

[04:41]

want it to do. For instance, let's

[04:43]

suppose just for the sake of discussion

[04:45]

that I want to make my own chatbot, not

[04:48]

chat GPT or Gemini and Claude, like

[04:50]

let's make our own in some sense. So,

[04:52]

I'm going to code up a program called

[04:54]

chat.py. And you might be familiar that

[04:56]

I using a language here.py is it's just

[05:00]

called Python. And if unfamiliar, you're

[05:01]

in good company. You'll learn that too

[05:03]

within a few weeks. And at the top of

[05:04]

the file here, I can write my code. And

[05:06]

at the bottom of the file of the window

[05:08]

here, I can run my code. So, here's how

[05:11]

relatively easy it is nowadays to write

[05:14]

even your own chatbot using the AI

[05:17]

technologies that we already have. I'm

[05:18]

going to go ahead and type a command

[05:20]

like import uh uh I'm going to go ahead

[05:23]

and type the following from OpenAI.

[05:26]

import open AI. We'll learn what this

[05:28]

means ultimately, but what I'm going to

[05:30]

do is write my own program on top of an

[05:33]

API, application programming interface

[05:36]

that someone else provides, a big

[05:37]

company called OpenAI, and they're

[05:39]

providing features and functionality

[05:41]

that now I can write code against. I'm

[05:43]

going to create a so-called client,

[05:44]

which is to say a program of my own

[05:47]

that's going to use this OpenAI

[05:50]

software. And then I'm going to go ahead

[05:51]

and ask this software for a response.

[05:54]

And I'm going to set that equal to

[05:56]

client.responses.create

[05:59]

whatever all that means. And then inside

[06:01]

of these parenthesis I'm going to say

[06:03]

the following. The input I want to give

[06:05]

to this underlying API is quote unquote

[06:08]

something like in one sentence

[06:11]

what is CS50? Much like I would ask

[06:13]

chatpt itself. If you're familiar with

[06:15]

things like chat GPT and AI more

[06:17]

generally nowadays, you know there's

[06:18]

this thing called models which are like

[06:19]

statistical models that ultimately drive

[06:21]

what the AIs can do. I'm going to go

[06:23]

ahead and say model equals quote unquote

[06:24]

gpt5 which is the latest and greatest

[06:27]

version at least as of today. Now down

[06:29]

in my terminal window I'm going to run a

[06:31]

different command python of chat.py and

[06:35]

so long as I have made no typographical

[06:37]

errors in this program I should be able

[06:39]

to ask openai not with chatgpt.com but

[06:44]

with my own code for the answer to some

[06:46]

question. But I want to know what the

[06:47]

answer to that question is. So, I

[06:49]

actually want to print out that response

[06:51]

by saying print response output text. In

[06:55]

other words, these 10 lines, and it's

[06:57]

not even 10 lines because a few of them

[06:58]

are blank, I've implemented my own

[07:00]

chatbot that at the moment is hard-coded

[07:02]

that is permanently configured to only

[07:04]

answer one question for me. And let's

[07:07]

see, with the cross of the fingers, CS50

[07:10]

is Harvard University's introductory

[07:11]

computer science course, the

[07:12]

intellectual enterprises of computer

[07:14]

science and the art of programming.

[07:15]

weirdly familiar covering problems

[07:16]

solving algorithms, data structures, and

[07:18]

more using languages like C, Python, and

[07:19]

SQL. Okay, interesting. But let's make

[07:21]

the program itself more dynamic. Suppose

[07:24]

you wanted to write code that actually

[07:26]

asks the human what their question is

[07:28]

because very quickly might we want to

[07:30]

learn something more than just this one

[07:31]

question. So up here, I'm going to go

[07:33]

and change my code and type something

[07:35]

like this. Type prompt equals input with

[07:39]

parenthesis. More on this another time,

[07:41]

too. But what I'm going to ask the user

[07:43]

for is to give me an actual prompt. That

[07:45]

is a question that I want this AI to

[07:47]

answer. And down here, what you'll

[07:49]

notice, even if you've never programmed

[07:51]

before, is that I can do something

[07:52]

somewhat intuitive in so far as line

[07:55]

five is now asking the human for input.

[07:57]

Let's just stipulate that this equal

[07:58]

sign means store that answer in a

[08:00]

variable called prompt where variables

[08:02]

just like in math x, y, or z. Let's go

[08:04]

ahead and store that in prompt. So the

[08:06]

input I want to give to open ai now is

[08:09]

that actual prompt. So, it's a

[08:10]

placeholder containing whatever

[08:12]

keystrokes the human typed in. If I now

[08:14]

run that same command again, python of

[08:16]

chat.py, hit enter, cross my fingers,

[08:20]

I'll see now dynamic prompting. So,

[08:23]

what's a question I might want to ask?

[08:24]

Well, let's just say it again. In one

[08:26]

sentence, whoops, in one sentence, what

[08:29]

is CS50? Question mark. Enter. And now

[08:32]

the answer comes back as probably

[08:36]

roughly the same but a little bit

[08:38]

different a variant thereof. But maybe

[08:40]

we can distill this even more

[08:42]

succinctly. How about let's run it

[08:43]

again. Python of chat.py and let's say

[08:45]

in one word what is CS50 and see if the

[08:49]

underlying AI obliges.

[08:52]

And after a pause course in a word. So

[08:56]

that's not all that incorrect. And maybe

[08:57]

we can have a little fun with this. Now

[08:59]

how about in one word which is

[09:04]

which is better maybe Harvard

[09:08]

or Stanford question mark hope you

[09:11]

picked right let's see the answer is

[09:16]

depends okay so would not in fact oblige

[09:19]

but notice what I keep doing in this

[09:21]

code I keep providing a prompt as the

[09:23]

human like in one sentence in one word

[09:25]

well if you want the AI to behave in a

[09:27]

certain A why don't we just tell the

[09:29]

underlying system to behave in that way

[09:31]

so I the human don't have to keep asking

[09:33]

it in one sentence in one sentence in

[09:34]

one word so we can actually introduce

[09:36]

one other feature that you'll hear

[09:38]

discussed in industry nowadays which is

[09:40]

not only a prompt from the user which

[09:42]

I'm going to now temporarily rename to

[09:44]

user prompt just to make clear it's

[09:46]

coming from the user I'm going to also

[09:47]

give our what's called a system prompt

[09:50]

by setting this equal to some

[09:52]

standardized instructions that I want

[09:55]

the AI to respect like limit your answer

[09:59]

to one sentence, quote unquote. And now,

[10:02]

in addition to passing in as input the

[10:05]

user prompt, I'm going to actually tell

[10:07]

Open III to use these instructions

[10:10]

coming from this other variable called

[10:13]

system prompt. So, in other words, I'm

[10:15]

still using the same underlying service,

[10:17]

but I'm handing it now not only what the

[10:18]

user typed in, but also this

[10:20]

standardized text limit your answer to

[10:22]

one sentence. So, the human like me

[10:24]

doesn't have to do that anymore. Let's

[10:26]

now go back to my terminal. run Python

[10:27]

of chat.py Pi once more and this time

[10:30]

we'll be prompted but now I can just ask

[10:32]

what is CS50 question mark and I'll

[10:35]

likely get a correct and similar answer

[10:39]

to before and indeed it's Harvard

[10:41]

University's flagship introductory

[10:42]

computer science course dot dot dot so

[10:44]

seems spot on too but now we can have

[10:46]

some fun with this too and you might

[10:48]

know that these GPTs nowadays have sort

[10:50]

of personalities you can make them

[10:52]

obliged to behave in one way or another

[10:54]

why don't we go into our system prompt

[10:55]

here and say something silly like

[10:57]

pretend You're a cat. And now let's go

[11:00]

back to the prompt one final time. Run

[11:03]

Python of chat.py. Prompt again will be

[11:06]

say what is CS50? And with a final

[11:09]

flourish of hitting enter, what do we

[11:11]

get back?

[11:14]

CS50 is Harvard University's

[11:15]

introductory computer science course

[11:16]

teaching programming algorithms, data

[11:18]

structures, and problem solving. And

[11:19]

it's available free online. Meow. So

[11:21]

that [snorts] was enough to coersse this

[11:23]

particular behavior. So this is to say

[11:26]

that with programming, you have the

[11:27]

ability in like 10 lines of text, not

[11:30]

all of which you might understand yet,

[11:32]

but that's the whole point of a class

[11:33]

like this to build fairly powerful

[11:35]

things, maybe silly things like this,

[11:37]

but in fact, it's using these same

[11:39]

primitives that CS50 has its own virtual

[11:41]

rubber duck. And we'll talk more about

[11:42]

this in the weeks to come, but long

[11:44]

story short, in the world of

[11:45]

programming, it's kind of a thing to

[11:47]

keep a rubber duck literally on your

[11:49]

desk or really any inanimate cute object

[11:51]

like this because when you are

[11:53]

struggling with some problem, some bug

[11:55]

or mistake in your code and you don't

[11:57]

have a friend, a teaching assistant, a

[11:58]

parent or someone else who's more

[12:00]

knowledgeable than you about code, well,

[12:02]

you literally are encouraged in

[12:03]

programming circles to like talk to the

[12:05]

rubber duck. And it's through that

[12:07]

process of just verbalizing your

[12:08]

confusion and organizing your thoughts

[12:10]

enough to convey it to another person or

[12:13]

duck in this case that so often that

[12:14]

proverbial light bulb goes off and you

[12:16]

realize ah I'm being an idiot now I hear

[12:18]

in my own thoughts the ill logic or the

[12:20]

mistake I'm making and you solve that

[12:22]

problem as well. So CS50 drawing

[12:24]

inspiration from this will give to you a

[12:27]

virtual duck in computer form and in

[12:29]

fact among the other URLs you'll use

[12:31]

over the course of the semester is that

[12:32]

here cs50.ai AI which is also built into

[12:35]

that previous URL cs50.dev dev whereby

[12:38]

these are the AIS you can use in CS50 to

[12:41]

solve problems and you are encouraged to

[12:43]

do so as you'll see in the course

[12:44]

syllabus it is not reasonable it is not

[12:46]

allowed to use AI based software other

[12:48]

than CS50's own be it claw Gemini chat

[12:51]

GPT or the like but it is reasonable and

[12:53]

very much encouraged along the way to

[12:55]

turn not only to humans like me your

[12:57]

teaching assistant and others in the

[12:59]

class but to CS50's own AI based

[13:01]

software and what you'll find is that

[13:02]

this virtual duck is designed to behave

[13:05]

as close to a good human tutor as you

[13:08]

might expect from an actual human in the

[13:10]

real world knows about CS50 knows how to

[13:12]

lead you to a solution ideally without

[13:14]

simply spoiling it and providing it

[13:16]

outright. So with that said that's sort

[13:19]

of the endgame to be able to write code

[13:21]

like that and more. But let's really

[13:23]

start back at the beginning and see how

[13:25]

we can't get from zeros and ones that

[13:28]

computers speak all the way back to

[13:30]

artificial intelligence. So computer

[13:32]

science is the in the name of the course

[13:34]

computer science 50. But what is that?

[13:35]

Well, it's really just the study of

[13:37]

information. How do you represent it?

[13:39]

How do you process it? And very much

[13:40]

gerine to computer science is what the

[13:42]

world calls computational thinking,

[13:43]

which is just the application of ideas

[13:45]

from computer science or CS to problems

[13:49]

generally in the real world. And in

[13:51]

fact, that's ultimately, I dare say,

[13:52]

what computer science really is. It's

[13:54]

about problem solving. And even though

[13:56]

we use computers, you learn how to

[13:58]

program along the way, these are really

[13:59]

just tools and methodologies that you

[14:02]

can leverage to solve problems. Now,

[14:04]

what does that mean? Well, a problem is

[14:06]

perhaps most easily distilled into a

[14:08]

simple picture like this. We've got some

[14:10]

input, which is like the problem we want

[14:11]

to solve, and the output, which is the

[14:13]

goal we want, the solution there, too.

[14:15]

And then somewhere in the middle here is

[14:16]

the proverbial black box, the sort of

[14:18]

secret sauce that gets that input from

[14:20]

output. So, this then I would say is in

[14:22]

essence is problem solving and thus

[14:24]

computer science. But we have to agree,

[14:27]

especially if we're going to use

[14:28]

devices, Macs, PCs, phones, whatever.

[14:30]

How do we all represent information, the

[14:32]

inputs and the outputs, in some

[14:34]

standardized way? Is it with English? Is

[14:36]

it with something else? Well, you all

[14:37]

probably know, even if you're not

[14:38]

computer people, that at the end of the

[14:40]

day, computers somehow use zeros and one

[14:43]

entirely. That is their entire alphabet.

[14:45]

And in fact, you might be familiar

[14:47]

already with certain such systems. So

[14:49]

the unary uh notation, which means you

[14:52]

essentially use single digits like

[14:54]

fingers on your hand. For instance,

[14:55]

unary aka base one is something you can

[14:57]

do on your own human hand. So for

[14:59]

instance, with one human hand, how high

[15:00]

can I count?

[15:02]

>> All right, so hopefully 1 2 3 4 5 and if

[15:05]

you want to count to six and uh to 11

[15:08]

and 10 and so forth, you need to, you

[15:10]

know, take out another hand or your toes

[15:12]

or the like because it's fairly

[15:13]

limiting. But if I think a little

[15:14]

harder, instead of just using unary,

[15:16]

what if I use a different system

[15:18]

instead? What about something like

[15:20]

binary? Well, how high if you think a

[15:22]

little harder can you count on one human

[15:23]

hand?

[15:25]

So 31 says someone who studied computer

[15:27]

science before. But why is that? It's

[15:29]

kind of hard to imagine, right? Because

[15:31]

1 2 3 4 5 seems to be the five possible

[15:34]

patterns. But that's only when you're

[15:36]

looking at the totality of fingers that

[15:37]

are actually up. Five in total or four

[15:39]

in total or one or the like. But what if

[15:41]

we take into account the pattern of

[15:43]

fingers that are up and we just

[15:44]

standardize what each of those fingers

[15:46]

represent? So maybe we all agree like a

[15:49]

good computer would too that maybe no

[15:51]

fingers up means the number zero. And if

[15:53]

we want to count to one, let's go with

[15:54]

the obvious. This is now one. But

[15:57]

instead of two being this, which was my

[15:59]

first instinct, maybe two can just be

[16:02]

this. A single second finger up like

[16:05]

this. And that means we could now use

[16:08]

two fingers up to represent three. I'll

[16:11]

propose we can use just one middle

[16:13]

finger up to offend everyone, but

[16:15]

represent four. I could maybe use these

[16:18]

two fingers with some difficulty to

[16:20]

represent five, six, seven. I'm already

[16:24]

up to seven having used only three

[16:26]

fingers. And in fact, if we keep going

[16:27]

higher and higher, I bet I can get as

[16:30]

high as 31 for 32 possible combinations,

[16:33]

but the first one was zero. So that's as

[16:35]

high as we can count. So we'll make this

[16:36]

connection in just a moment. But what I

[16:38]

started to do there is something called

[16:40]

base 2. Instead of just having fingers

[16:42]

up or fingers down, I'm taking into

[16:44]

account the positions of those fingers

[16:46]

and giving meaning to like this finger

[16:49]

here, this finger here, this finger here

[16:51]

and so forth. Different weights if you

[16:52]

will. So the binary system is indeed all

[16:56]

computers understand. And you might be

[16:58]

familiar with some terminology here.

[17:00]

Binary digit is not really something

[17:01]

anyone really says, but the shorthand

[17:03]

for that is going to be bit. So if

[17:06]

you've heard of bits and we'll soon see

[17:08]

bytes and then kilobytes and megabytes

[17:10]

and gigabytes and terabytes and more.

[17:12]

This just refers to a bit meaning a

[17:15]

single binary digit either a zero or a

[17:19]

one. A zero is perhaps most simply

[17:22]

represented by just like turning maybe

[17:24]

keeping a finger down or in the world of

[17:26]

computers which have access to

[17:28]

electricity be it from the wall or maybe

[17:30]

a battery. You know what we could do? We

[17:33]

could just decide sort of universally

[17:35]

that when a light bulb is off, that

[17:38]

thing represents a zero. And when the

[17:39]

light bulb is on, that thing's going to

[17:41]

represent a one instead. Now, why is

[17:44]

this? Well, electricity is such a simple

[17:45]

thing, right? It's either flowing or

[17:47]

it's not. And we don't even have to

[17:49]

therefore worry about how much of it is

[17:51]

flowing. And if you're vaguely remember

[17:53]

a little bit about voltage, we can sort

[17:54]

of be like zero volts, nothing's there

[17:56]

available for us. Or maybe it's 5 volts

[17:58]

or something else in between. But what's

[18:00]

nice about binary only using zeros and

[18:03]

ones is that it maps really nicely to

[18:05]

the real world by like throwing a light

[18:07]

switch on and off. You can represent

[18:09]

information by just using a little bit

[18:11]

of electricity or the lack thereof. So

[18:13]

what do I mean by this? Well, suppose we

[18:15]

want to start counting using binary

[18:18]

zeros and ones only. Well, let's think

[18:20]

of them metaphorically as like akin to

[18:22]

these light bulbs here. And in fact, let

[18:24]

me grab a few of these light bulbs and

[18:25]

let me propose that if we want to

[18:27]

represent the number zero, well, it

[18:29]

stands to reason that here single light

[18:32]

bulb that is off can be agreed upon as

[18:34]

representing zero. Now, in practice,

[18:37]

computers don't have little light bulbs

[18:38]

inside, but they do have little switches

[18:40]

inside. Millions of tiny little things

[18:43]

called transistors that if turned on can

[18:45]

allow it to capture a little bit of

[18:47]

electricity and effectively turn on a

[18:49]

metaphorical bulb or the switch can go

[18:51]

off. the transistor can go off and

[18:53]

therefore let the electricity dissipate

[18:54]

and you have just now a zero.

[18:56]

Unfortunately, even though I can let

[18:59]

some electricity, there's the battery I

[19:02]

mentioned is required. Even though we

[19:04]

might have some electricity available to

[19:06]

us, I can therefore count to one. But

[19:08]

how do I go about counting? [snorts]

[19:11]

Hardware problem. How do I go about

[19:13]

counting higher than one with just a

[19:16]

light bulb?

[19:18]

Yeah. So, I need more of them. So, let

[19:20]

me grab another one here. And now I

[19:22]

could put it next to it. And this two

[19:24]

I'll claim is just still the number one.

[19:26]

But if I want to turn two of them on,

[19:28]

well, that would mean I could count to

[19:30]

two. And if I maybe grab another one,

[19:32]

now I can count as high as three. But

[19:34]

wait a minute. I'm doing something wrong

[19:36]

because with three human fingers, how

[19:37]

high was they able to count?

[19:40]

So, seven in total, starting at zero.

[19:42]

So, I've done something wrong here. But

[19:43]

let me be a little more clever than

[19:45]

about the pattern that I'm actually

[19:46]

using. Perhaps this can still be one.

[19:50]

But just like my finger went up and only

[19:52]

one finger in the second version of

[19:54]

this, this can be what we represent as

[19:58]

two. Which one do I want to turn on as

[20:00]

three? Your left or your right?

[20:03]

>> So you're right because now this matches

[20:05]

what I was doing with my fingers a

[20:06]

moment ago. And I claimed we could

[20:08]

represent three like this. If we want to

[20:09]

represent four, that's fine. We have to

[20:11]

turn that off, this off, and this on.

[20:15]

And that's somehow four. And let's go

[20:17]

all the way up to seven. Which ones need

[20:19]

to be on to represent the number seven?

[20:21]

All right. So, all of them here. Now, if

[20:23]

you're not among those who just sort of

[20:25]

naturally said all of them, like what

[20:27]

the heck is going on? How do half the

[20:29]

people in this room know what these

[20:30]

patterns are supposed to be? Well, maybe

[20:32]

you're remembering what I did with my

[20:33]

fingers. But it turns out you're already

[20:35]

pretty familiar with systems like this,

[20:37]

even if you might not have put a name to

[20:39]

it. So in the human world, the real

[20:41]

world, most of us deal every day with

[20:43]

the so-called base 10 system, otherwise

[20:45]

known as decimal deck implying 10

[20:47]

because in the decimal system you have

[20:49]

10 digits available to you, 0 through 9.

[20:52]

In the binary system, we only had two by

[20:54]

implying two. So 0 and one and unary we

[20:58]

had just one, a single digit there or

[21:01]

not. So in the decimal system, we just

[21:03]

have more of a vocabulary to play with.

[21:05]

And yet you and I have been doing this

[21:07]

since grade school. So this is obviously

[21:08]

the number 123. But why? It's

[21:11]

technically just three symbols. 1 2 3.

[21:14]

But most of us, your mind ego goes,

[21:16]

okay, 123. Pretty obvious, pretty

[21:18]

natural. But at some point, you like me

[21:21]

were probably taught that this is the

[21:22]

one's place and this is the 10's place

[21:25]

and this is the 100's place and so

[21:28]

forth. And the reason that this pattern

[21:30]

of symbols 1 2 3 is 123 is that we're

[21:35]

all doing some quick mental math and

[21:36]

realizing well that's 100* 1 + 10 * 2 +

[21:39]

1 * 3. Oh, okay. There's how we get 100

[21:42]

+ 20 + 3 gives us the number we all know

[21:45]

mathematically is 123. Well, it turns

[21:48]

out whether you're using decimal or

[21:50]

binary or other base systems that we'll

[21:52]

talk about later in the course, the

[21:53]

system is still fundamentally the same.

[21:55]

Let's kind of generalize this away.

[21:56]

Here's a three-digit number in some base

[21:59]

system specifically in decimal. And I

[22:01]

know that only because of the

[22:02]

placeholders that I've got on top of

[22:04]

each of these numbers. But if we do a

[22:05]

little bit of math here, 1 10 100 1,000

[22:09]

10,000 and so forth. What's the pattern?

[22:11]

Well, technically this is 10^ the 0 10

[22:13]

the 1 10 the 2 and so forth. And we're

[22:16]

using 10 because we can use as many as

[22:19]

10 digits under each of those columns.

[22:22]

But if we take some of those digits away

[22:23]

and go from decimal down to binary, the

[22:26]

motivation being it's way easier for a

[22:29]

computer to distinguish electricity

[22:30]

being on or off than coming up with like

[22:34]

10 unique levels of electricity to

[22:36]

distinguish among. You could do it. It

[22:38]

would be annoying and difficult to build

[22:39]

in hardware. You could do it so much

[22:41]

simpler to just say on and off. It's a

[22:45]

nice simple world that way. So let's

[22:47]

change the base from 10 to two. And what

[22:50]

does this get us? Well, if we now do

[22:51]

undo the math, that's 2 to the 0 is 1. 2

[22:54]

to the 1 is 2. 2 to the 2 is 4. So the

[22:57]

ma the mental math is now about to be

[22:59]

the same, but the columns represent

[23:01]

something a little bit different. So for

[23:03]

instance, if I turn all of these off

[23:05]

again, such that I've got off, off off,

[23:08]

otherwise known as 0 0, it's zero

[23:12]

because it's 4 * 0 + 2 * 0 + 1 * 0 still

[23:17]

gives me zero. By contrast, if I turn on

[23:20]

maybe just this one all the way over on

[23:22]

the left, well, that's four times one

[23:26]

because on represents one and off

[23:28]

represents 0 plus 2 * 0 + 1 * 0, that

[23:31]

gives me four. And if I turn both of

[23:33]

these on, such that all three of them

[23:36]

are now on, on on aka one, one, one,

[23:40]

that's 4 * 1 + 2 * 1 + 1 * 1. That then

[23:45]

gives me seven. And we can keep adding

[23:47]

more and more bits to this. In fact, if

[23:49]

we go all the way up uh numerically,

[23:51]

here's how we would represent in binary

[23:53]

the number you and I know is zero.

[23:55]

Here's how we would represent one.

[23:58]

Here's how we would represent two and

[24:00]

three and four and five. And you can

[24:03]

kind of see in your mind's eye now

[24:04]

because I only have zeros and ones and

[24:06]

no twos or threes, not to mention nines,

[24:09]

I'm essentially going to be carrying a

[24:11]

one in a moment if we were to be doing

[24:12]

some math. So to go from five to six,

[24:15]

that's why the one ends up in the middle

[24:17]

column. To go to seven here gives us now

[24:19]

1 one or on on on. How do I represent

[24:22]

eight

[24:24]

using ones and zeros? Yeah,

[24:27]

>> we need to add another digit.

[24:28]

>> Yeah. So we're going to need to add

[24:29]

another digit. We need to throw hardware

[24:31]

at the problem using an additional digit

[24:33]

so that we actually have a column

[24:35]

representing eight. Now, as an aside,

[24:37]

and we'll talk about this before long,

[24:38]

if you don't have an additional digit

[24:41]

available, if your computer doesn't have

[24:43]

enough memory, so to speak, you might

[24:45]

accidentally count from 0 1 2 3 4 5 6 7

[24:49]

and then accidentally end up back at

[24:51]

zero. Because if there's no room to

[24:53]

store the fourth bit, well, all you have

[24:56]

is part of the number. And this is going

[24:58]

to create all sorts of problems then

[25:00]

ultimately in the real world. So let me

[25:02]

go ahead and put these back and propose

[25:04]

that we have a system now. If you agree

[25:08]

to sort of count numbers in this way via

[25:10]

which we can represent information in

[25:12]

some standard way and all the device

[25:14]

underneath the hood needs is a bit of

[25:16]

electricity to make this work. It's got

[25:18]

to be able to turn things on aka use

[25:20]

some transistors and it's got to be able

[25:21]

to turn those things off so as to

[25:23]

represent zeros instead of ones. But the

[25:26]

reality is like two bits, three bits,

[25:28]

four bits aren't very useful in the real

[25:30]

world because even with three bits you

[25:32]

can count to seven, with four you can

[25:33]

count to 15. These aren't very big

[25:36]

numbers. So it tends to be more common

[25:38]

to actually use units of measure of

[25:40]

eight bits at a time. A bite is just

[25:43]

that one bite is eight bits. So if

[25:46]

you've ever used the vernacular of

[25:47]

kilobytes, megabytes, gigabytes, that's

[25:49]

just referring to some number of bits.

[25:52]

But eight of them together compose one

[25:55]

individual bite. So here for instance is

[25:58]

a bite worth of bits. Eight of them

[26:00]

total. I've added all the additional

[26:02]

placeholders. And what number does this

[26:04]

represent in decimal even though you're

[26:06]

looking at eight binary digits?

[26:09]

>> Just zero cuz like literally every

[26:10]

column is a zero. Now this is a bit more

[26:13]

of mental math but unless you know it

[26:15]

already. What if I change all of the

[26:16]

zeros to ones? I turn all eight light

[26:18]

bulbs on. What number is this?

[26:21]

>> Yeah. So 255. Now some of those of you

[26:24]

who didn't get that instantly, that's

[26:25]

fine. You could certainly do the math

[26:26]

manually. I dare say some of you have

[26:28]

some prior knowledge of how to do this

[26:30]

sort of system. But 255 means that if

[26:34]

you start counting at zero and you go

[26:35]

all the way up to 255, okay, that's 256

[26:39]

total possibilities once you include

[26:41]

zero in the total number of patterns of

[26:44]

zeros and ones. And this is just going

[26:45]

to be one of these common numbers in

[26:47]

computer science. 256. Why? because it's

[26:50]

referring to eight of something. 2 to

[26:52]

the 8 gives you 256. And so you're going

[26:56]

to commonly see certain values like

[26:57]

that. 256. Back in the day, computers

[26:59]

could only show 256 colors on the

[27:02]

screen. Certain graphics formats

[27:04]

nowadays that you might download can

[27:06]

only use as many as 256 colors because,

[27:08]

as we'll see, they're only using, for

[27:10]

instance, eight bits, and therefore they

[27:13]

can only represent so many colors of the

[27:15]

rainbow as a result. So this then is how

[27:20]

we might go from just zeros and ones

[27:22]

electricity inside of a computer to

[27:24]

storing actual numbers with which we're

[27:26]

familiar. And honestly we can go higher

[27:27]

than 255. What do you need to count

[27:29]

higher than 255? A 9th bit, a 10th bit,

[27:32]

an 11th bit and so forth. And it turns

[27:34]

out common conventions nowadays and

[27:36]

we'll see this in code too is to use as

[27:38]

many as 32 bits at a time. So that's a

[27:42]

good chunk of bits. And anyone want to

[27:43]

ballpark how high you can count count if

[27:45]

you've got 32 bits available to you?

[27:50]

Oh, fewer people now. Yeah, in the back.

[27:53]

>> Yeah. So, it's roughly 4 billion. And

[27:55]

it's technically two billion if you also

[27:56]

want to represent negative numbers, but

[27:58]

we'll revisit that question. But 2 to

[28:00]

the 32nd power is roughly 4 billion.

[28:03]

However, nowadays it's even more common

[28:05]

with the Macs and PCs you might have on

[28:07]

your laps and even your phones nowadays

[28:08]

to use 64 bits, which is a big enough

[28:11]

number that I'm not even sure offhand

[28:13]

how to pronounce it. That's a lot of

[28:15]

permutations. That's 2 to the 64

[28:17]

possible permutations, but that's

[28:19]

increasingly common place. And as an

[28:21]

aside, just to dovetail things with our

[28:22]

discussion of AI, among the reasons that

[28:24]

we're living through over these past few

[28:26]

years, especially this crazy interesting

[28:28]

time of AI, is because computers have

[28:31]

been getting so much faster,

[28:33]

exponentially so over time, they have so

[28:35]

much more memory available to them.

[28:37]

There's so much data out there on the

[28:38]

internet in particular to train these

[28:40]

models that it's an interesting

[28:42]

confluence of hardware now actually

[28:44]

meeting the mathematics and statistics

[28:45]

that we'll talk about later in the class

[28:47]

that ultimately make tools like the cat

[28:49]

we just built possible. But of course

[28:52]

computers are not all math and in fact

[28:53]

we'll use very little math per se in

[28:55]

this class. And so let's move away

[28:57]

pretty quickly from just zeros and ones

[28:59]

and talk about letters of the alphabet.

[29:00]

Say in English here is the letter A.

[29:03]

Suppose you want to use this letter in

[29:05]

an email, a text message, or any other

[29:07]

program. What is the computer doing

[29:08]

underneath the hood? How can the

[29:10]

computer store a capital letter A in

[29:14]

English? If at the end of the day, all

[29:16]

the computer has access to is a source

[29:18]

of electricity from the wall or from a

[29:21]

battery and it has a lot of switches

[29:24]

that it can turn on and off and treat

[29:26]

the electricity in units of 8 or 32 or

[29:28]

64 or whatever.

[29:31]

How might a computer represent a letter

[29:33]

[29:36]

>> Yeah, we need to give it an identity so

[29:38]

to speak as an integer. In other words,

[29:40]

at the end of the day, if your entire

[29:42]

canvas, so to speak, consists only of

[29:44]

zeros and ones. Like that is going to be

[29:46]

the answer to every question today. You

[29:48]

only have zeros and ones as the solution

[29:50]

to these problems. We just need to agree

[29:53]

what pattern of zeros and ones and

[29:54]

therefore what integer, what number

[29:57]

shall be used to represent the letter A.

[30:00]

And hopefully when we look at that

[30:02]

pattern of zeros and ones in the right

[30:04]

context, we'll indeed see it as an A. So

[30:06]

if we look inside of a computer so to

[30:07]

speak in the context of like a text

[30:09]

messaging program or a word processor or

[30:12]

anything like that, that pattern shall

[30:14]

be interpreted hopefully as a capital

[30:15]

letter A. But if I open up Mac OS's or

[30:17]

Windows or my phone's calculator

[30:19]

program, I would want that same pattern

[30:21]

of zeros and ones to be interpreted

[30:24]

instead as a number. If I open up

[30:26]

Photoshop, as we'll soon see, I want

[30:28]

that same pattern of zeros and ones to

[30:30]

be interpreted as a color presumably,

[30:33]

not to mention videos and sound and so

[30:34]

forth, but it's all just zeros and ones.

[30:36]

And so, even though I, when writing that

[30:38]

chat program a few minutes ago, didn't

[30:41]

have to worry about telling the

[30:42]

computer, oh, this is text, this is a

[30:44]

number, this is something else. We'll

[30:46]

see as we write code ourselves that you

[30:48]

as the programmer will have control over

[30:50]

telling the computer how to treat some

[30:53]

pattern of zeros and ones telling it

[30:55]

this is a number, this is a color, this

[30:56]

is a letter or something else. Um, how

[30:59]

do we represent the letter A? Well,

[31:01]

turns out a bunch of humans in a room

[31:03]

years ago decided ah this pattern of

[31:05]

zeros and ones shall be known globally

[31:08]

as a capital letter English A. What is

[31:12]

that number if you do the quick mental

[31:13]

math? So indeed 65 because we had a one

[31:16]

in the 64's place and a one in the onees

[31:19]

place. So 65 that's just sort of it. It

[31:21]

would have been nice if it were just the

[31:22]

number one or maybe the number zero. But

[31:25]

at least after the capital letter A,

[31:27]

they kept things consistent such that if

[31:30]

you want to represent a letter B, it's

[31:32]

going to be 66. Capital letter C, it's

[31:34]

going to be 67. Why? Because the humans

[31:36]

in this room, a bunch of Americans at

[31:38]

the time, standardized on what's called

[31:39]

ASKI, the American standard code for

[31:42]

information interchange. doesn't matter

[31:43]

what the acronym represents, but it was

[31:46]

just a mapping. Someone on a piece of

[31:47]

paper essentially started writing down

[31:49]

letters of the alphabet and

[31:50]

corresponding numbers so that computers

[31:53]

subsequently could all speak that same

[31:55]

standard representation. And here's an

[31:57]

excerpt thereof. In this case, we're

[31:59]

seeing seven bits worth, but eventually

[32:00]

we ended up using eight bits in total to

[32:03]

represent letters. And some of these are

[32:05]

fairly cryptic. Maybe more on those

[32:06]

another time. But down here, if we

[32:08]

highlight just one column, we'll see

[32:09]

that indeed on this cheat sheet, 65 is

[32:12]

capital A, 66 is B, 67 is C, and so

[32:16]

forth. So, why don't we do a little

[32:18]

exercise here? What pattern of zeros and

[32:21]

ones do I see here? I've got three

[32:23]

bytes, so three sets of eight bits. And

[32:26]

even though there's no placeholders now

[32:28]

over the columns, what is this

[32:31]

number?

[32:33]

It's 60. Yeah. Yeah. So, we got the

[32:36]

ones, twos, fours, 8s, uh, 16, 32, 64s

[32:42]

column. So, indeed, this is going to be

[32:43]

the number 72. 72. This is not what

[32:46]

computer scientists spend their day

[32:48]

doing. This is just to reinforce what it

[32:49]

is we just looked at. And I'll spoil it.

[32:51]

The rest of these numbers are 72 73 33.

[32:54]

And anyone in this room could have done

[32:56]

that if you took out a piece of paper,

[32:57]

figured out what the columns are, and

[32:59]

just do a bit of quick or mental or

[33:01]

written math. But this is to say,

[33:03]

suppose that you just got a text message

[33:04]

or an email that if you had the ability

[33:07]

to look underneath the hood of the

[33:09]

computer and see what pattern of zeros

[33:11]

and ones did you just receive over the

[33:13]

internet. Suppose that pattern of zeros

[33:14]

and ones was three bytes of bits, which

[33:18]

when you do the math are the numbers 72,

[33:20]

73, 33. Well, here's the cheat sheet

[33:23]

again. What message did you just get?

[33:26]

>> Yeah. So, it's high. Why? Because 72 is

[33:29]

H and 73 is I. Now, some of you said hi

[33:32]

fairly emphatically. Why? Well, 33 turns

[33:35]

out, and you wouldn't know this unless

[33:36]

you looked it up or someone told you, is

[33:39]

an exclamation point. So, literally, if

[33:41]

you were to text someone like right now,

[33:42]

if you haven't already, hi exclamation

[33:45]

point in all caps, you would essentially

[33:47]

be sending three bytes of information

[33:49]

somehow over the internet to that

[33:51]

recipient. And because their phone

[33:53]

similarly understands ASI because it was

[33:56]

programmed years ago to do so, it knows

[33:58]

to show you hi exclamation point and not

[34:02]

a number three numbers no less or colors

[34:04]

or something else altogether. So here we

[34:07]

then have hi three digits in a row here.

[34:10]

Um what else is worth noting here? Well,

[34:12]

there's some fun sort of trivia embedded

[34:15]

even in this cheat sheet. So here again

[34:16]

is a b cde e fg and so forth. 65 on

[34:20]

down. Let me just highlight over here

[34:23]

the lowercase letters 97 98 99 and so

[34:27]

forth. If I go back and forth, does

[34:30]

anyone notice the consistent pattern

[34:33]

between these two?

[34:35]

>> Yeah. So, the lowercase letters are 32

[34:38]

away from the uppercase letters. Well,

[34:40]

how do we know that? Well, 97 - 65 is

[34:43]

Yeah. 32. Uh 98 - 66 is okay. 32. And

[34:48]

that pattern continues. What does this

[34:50]

mean? Well, computers know how to do

[34:51]

this. Most normal humans don't need this

[34:53]

information. But what it means is if you

[34:55]

are representing in binary with your

[34:58]

transistors on and off representing some

[35:00]

pattern and this is the pattern

[35:01]

representing capital letter A, which is

[35:03]

why we have a one in the 64's place and

[35:05]

a one in the onees place. How does a

[35:08]

computer go about lowercasing this same

[35:11]

letter? Yeah,

[35:15]

>> perfect. All the computer has to do is

[35:17]

change this one bit in the 32's place to

[35:21]

a one because that has the effect

[35:22]

mathematically per our discussion of

[35:24]

adding the number 32 to whatever it is.

[35:27]

So it turns out you can force text from

[35:29]

uppercase to lowerase or back by just

[35:31]

changing a single bit inside of that

[35:34]

pattern of eight bits in total. All

[35:37]

right, why don't we maybe reinforce this

[35:38]

with another quick exercise? We have an

[35:40]

opportunity perhaps here for um maybe to

[35:42]

give you some stress balls right at the

[35:44]

very start of class. Could we get eight

[35:45]

volunteers to come up on stage? Maybe

[35:48]

over here and over here and uh over here

[35:51]

on the left. Let me go all the way on

[35:52]

the right. Uh let's see. Okay, the high

[35:55]

hand here. The the hand that's highest

[35:57]

there. Yes, we're making eye contact.

[35:58]

How about all the way? Wait, let's see.

[36:00]

Let's go here in the crimson sweatshirt

[36:03]

here. And how about in the the white

[36:04]

shirt here? Come on up. Did I count

[36:06]

correctly? Let's see.

[36:10]

Come on down. The eight of you. I didn't

[36:13]

count right, did I? 1 2 3 4 5 6. It's

[36:17]

ironic that I'm not counting correctly.

[36:18]

Eight here. How about on the left in

[36:20]

gray? Okay. Oh, and uh Okay. In black

[36:23]

here. Come on down. All right.

[36:24]

Hopefully, this is eight. 1 2 3 4 5 6 7.

[36:30]

I pretty. Okay. Eight. There we go. All

[36:32]

right. So, let's go ahead and do the

[36:34]

following exercise. I've got some sheets

[36:36]

of paper preprinted here. If each of you

[36:38]

indeed want to do exactly what you're

[36:39]

doing and line up from left to right,

[36:40]

each of you is going to represent a

[36:42]

placeholder essentially. So we have over

[36:45]

here the ones place all the way over

[36:47]

here. And then we have the two's place

[36:50]

and the four's place and the eights

[36:54]

[36:56]

32 64 128. And we come bearing a

[37:00]

microphone if each of you want to say a

[37:02]

quick hello. your name, maybe your dorm

[37:03]

or house, and something besides computer

[37:05]

science that you're studying or want to.

[37:08]

>> Hi, I'm Oh, that's loud. Okay. I'm

[37:10]

Allison. I'm a freshman in Matthews and

[37:15]

um I like climbing and I'm thinking of

[37:17]

CS and econ.

[37:19]

>> Number two.

[37:20]

>> Hi, I'm Lily. I'm in Herbut this year

[37:23]

and I'm thinking of doing CS in

[37:25]

government.

[37:26]

>> Nice to meet.

[37:27]

>> Hi. Hi, I'm Sean. I'm in candidate hall

[37:30]

and I'm thinking of doing astrophysics

[37:32]

and CS.

[37:33]

>> Welcome.

[37:34]

>> Hi, I'm Jordan. I'm doing applied math

[37:36]

with a specialization in CS and econ.

[37:40]

And um I'm in Wigglesworth and I like

[37:43]

going to the gym.

[37:44]

>> Okay, [laughter]

[37:45]

nice. 16.

[37:46]

>> Hi, I'm Shiv. I'm studying Macki and I'm

[37:49]

in Canada.

[37:50]

>> Nice.

[37:51]

>> Hi, I'm Sophia. I'm in the think of

[37:55]

doing electrical engineering.

[37:57]

>> Welcome. Hi, my name is Marie and I'm in

[38:00]

Canada B and I really like CS physics

[38:03]

and astrophysics.

[38:05]

>> Hi, I'm Alyssa. I'm in Hullworthy. I'm

[38:09]

also thinking of studying math or

[38:11]

physics and I also like to climb.

[38:13]

>> Nice. Welcome to you all. So, on the

[38:16]

backs of their sheets of paper, they

[38:18]

have a little cheat sheet that's

[38:19]

describing what they should do in each

[38:20]

of three rounds. We're going to spell

[38:22]

out together a threeletter word. You all

[38:24]

as the audience have a cheat sheet above

[38:26]

you that represents numbers to letters.

[38:28]

These folks don't necessarily know what

[38:30]

they're spelling. They only know what

[38:31]

they individually are spelling. So if

[38:33]

your sheet of paper tells you to

[38:34]

represent a zero in a given round, just

[38:36]

kind of stand there awkwardly, no hands

[38:38]

up. But if you're told on your sheet of

[38:40]

paper to represent a one, just raise a

[38:42]

single hand to make obvious to the

[38:43]

audience that you're representing a one

[38:45]

and not a zero. And the goal here is to

[38:47]

figure out what we are spelling using

[38:48]

this system called ASKI. All right,

[38:51]

round one, execute.

[38:55]

What number is this here?

[39:00]

I'm hearing You can just shout it out.

[39:02]

What number?

[39:04]

>> 66 or B. So, you're spelling B. All

[39:07]

right, hands down. Round two.

[39:11]

More math.

[39:15]

Feel free to shout it out.

[39:18]

>> Oh, I heard it. Yeah. 79, which is

[39:20]

>> O. Okay, so we have B O. Hands down.

[39:23]

Third and final round. Execute

[39:27]

number

[39:30]

87.

[39:31]

>> Yes. 87. Which is the letter?

[39:34]

>> W. Which spells

[39:36]

>> bow? If you want to take your bow now.

[39:39]

>> Ah, okay. Here we go. You guys can keep

[39:41]

those.

[39:44]

Okay. Thank. All right. You guys can

[39:46]

head back. Thank you to our volunteers

[39:48]

here. Very nicely done. We indeed

[39:50]

spelled out bow and that's just because

[39:52]

we all standardized on representing

[39:54]

information in exactly the same way

[39:56]

which is why when you type b on your

[39:59]

phone or your computer the recipient

[40:00]

sees the exact same thing but what's

[40:03]

noteworthy in this discussion is that

[40:05]

you can't spell a huge number of words

[40:07]

like yeah English okay we've got that

[40:09]

covered but odds are you're noticing

[40:11]

depending on your own background what

[40:12]

human languages you read or speak

[40:14]

yourself um that a whole bunch of

[40:16]

symbols might be missing from your

[40:17]

keyboard for instance we have accented

[40:19]

characters here in a lot of Asian

[40:21]

languages there's so many more glyphs

[40:22]

than we could have even fit in that

[40:23]

cheat sheet of numbers and letters and

[40:26]

so ASI is not the only system that the

[40:28]

world uses it was one of the earliest

[40:30]

but we've moved on in modern times to a

[40:33]

superset of ASI that's generally known

[40:35]

as Unicode and Unicode uses so many more

[40:38]

bits than ASI that we even have room for

[40:41]

all of these little things that we seem

[40:42]

to send constantly nowadays these are

[40:45]

obviously images that you might send

[40:47]

with your phone or your computer but

[40:48]

they're technically ally characters.

[40:51]

They're technically just patterns of

[40:52]

zeros and ones that have similarly been

[40:54]

standardized around the world to look a

[40:57]

certain way, but they're this is an

[40:59]

emoji keyboard in the sense that you're

[41:01]

sending characters. You're not sending

[41:03]

images per se. The characters are

[41:05]

displayed as images obviously, but

[41:07]

really these are just like characters in

[41:09]

a different font and that font happens

[41:11]

to be very colorful and graphical as

[41:13]

well. So, Unicode instead of using just

[41:16]

seven or eight bits, which if you do the

[41:18]

quick mental math, if ASKI only used

[41:20]

seven or let's say eight bits, how many

[41:23]

possible characters can you represent in

[41:25]

ASKI alone?

[41:27]

256. Because if we do that quick mental

[41:29]

math, 2 to the eth 256 possibilities,

[41:31]

like that's it. That is that's enough

[41:33]

for English because you can cram all the

[41:34]

uppercase letters, the lowercase

[41:36]

letters, the numbers, and a whole bunch

[41:37]

of punctuation as well. But it's not

[41:39]

enough for certain other punctuation

[41:41]

symbols, not to mention many other human

[41:43]

languages. And so the Unicode

[41:45]

Consortium, its charge in life has been

[41:47]

to come up with a digital representation

[41:49]

of all human language, past, present,

[41:52]

and hopefully future by using not just

[41:55]

seven or eight bits, but maybe 16 bits

[41:57]

per character, 24 bits, or heck, even 32

[42:01]

bits per character. And per before, if

[42:03]

you've got as many as 32 bits available

[42:05]

to you, you can represent what, like 4

[42:07]

billion characters in total. And that's

[42:10]

just one of the reasons why these emoji

[42:11]

have kind of exploded in popularity and

[42:13]

availability. There's just so many darn

[42:15]

patterns. Like, what else are we going

[42:17]

to do with all of these zeros and ones?

[42:19]

But more importantly, emoji have been

[42:21]

designed to really represent people and

[42:23]

places and things and emotions in a way

[42:25]

that transcends human language. But even

[42:28]

then, they're somewhat open to

[42:29]

interpretation. In fact, here's a

[42:31]

pattern of I think 32 zeros and ones.

[42:35]

I'm guessing no one's going to do the

[42:36]

quick mental math here, but this

[42:37]

represents what decimal number if we do

[42:39]

in fact do out the math with that's

[42:41]

being the ones place all the way over to

[42:43]

the left. Well, that's the number 4

[42:44]

bill36,991,16.

[42:47]

Who knows what that is? It's not a and

[42:50]

it's nothing near a uppercase or

[42:52]

lowercase, but it is among the most

[42:54]

popular emoji that you might send

[42:56]

typically on your phone, laptop, or

[42:58]

other device. namely this thing here

[43:00]

face with tears of joy which odds are

[43:03]

you've sent or received recently but

[43:06]

interestingly even though many of you

[43:07]

might have iPhones and see and send the

[43:10]

same image you'll notice that if you see

[43:12]

a friend who's got Android or some other

[43:14]

device maybe you're using uh Meta's

[43:16]

messenger program or Telegram or some

[43:19]

other messaging service sometimes these

[43:21]

emoji look a little bit different why

[43:23]

because what a Unicode has done is they

[43:25]

decided there shall exist an emoji known

[43:28]

known as excuse me faced with tears of

[43:31]

joy then Apple and Google and Microsoft

[43:34]

and others they're sort of free to

[43:36]

interpret that as they see fit. So what

[43:37]

you see on the screen here is a recent

[43:39]

version from iOS, Apple's operating

[43:41]

system. Google's version of the same

[43:43]

looks a little something like this. And

[43:44]

on Telegram, if you have animations

[43:46]

enabled, the same idea faced with tears

[43:48]

of joy is actually animated. But it's

[43:50]

the same pattern of zeros and ones in

[43:53]

each case. But again, they each

[43:55]

essentially have different graphical

[43:56]

fonts to present to you what each of

[43:59]

those images actually is. All right. So,

[44:02]

those are each, excuse me,

[44:04]

[clears throat] images.

[44:08]

So, those are each images. How is the

[44:11]

computer representing them though? At

[44:13]

the end of the day, we've represented

[44:15]

numbers, we've represented letters, but

[44:18]

how about these things here, colors? So,

[44:21]

how do we represent red or green or

[44:24]

blue, not to mention every other color

[44:25]

in between? At the end of the day, we

[44:28]

only have one canvas at our disposal.

[44:30]

Yeah,

[44:32]

so integers is the exact same answer as

[44:34]

before. We just need to agree on what

[44:36]

number do we use for red, what do we use

[44:38]

for green, what do we use from blue, and

[44:40]

we can come up with some standardized

[44:42]

pattern for this. In fact, one of the

[44:43]

most common techniques for doing this

[44:45]

and the common one of the most common

[44:46]

ways to do this in the real world is to

[44:48]

use a combination of three colors

[44:50]

together. Some amount of red, some

[44:52]

amount of green, and some amount of

[44:54]

blue, and mix them together to get most

[44:56]

any color of the rainbow that you might

[44:57]

want. This is sort of a a picture of

[44:59]

something I grew up with back in the day

[45:01]

where in like middle school when we'd

[45:02]

watch movies or some kind of show in

[45:04]

like in in class, we would kind of uh

[45:07]

the projector screen would be over here.

[45:09]

This is a old school projector with

[45:11]

three different lenses, one of which

[45:13]

projects some amount of green, some

[45:14]

amount of red, some amount of blue. And

[45:16]

so long as the lenses are correctly

[45:18]

oriented to all point at the same circle

[45:21]

or like rectangular region on the

[45:22]

screen, you would see any number

[45:24]

[clears throat] of colors coming to life

[45:26]

in the old school video. I still

[45:28]

remember all these years later, we would

[45:29]

kind of sit and lean up against it

[45:31]

because it was super warm and you could

[45:32]

hear it easy way to fall asleep back in

[45:34]

grade school. But we use the same

[45:36]

fundamental color system nowadays as

[45:39]

well, including in modern programs like

[45:40]

Photoshop. So let's abstract that away.

[45:43]

focus on just three colors, some amount

[45:45]

of red, green, and blue. And let's

[45:47]

suppose for the sake of discussion that

[45:49]

we want to mix together like a medium

[45:51]

amount of red, a medium amount of green,

[45:53]

and just a little bit of blue. For

[45:55]

instance,

[45:57]

let's suppose that we'll use 72 amount

[46:00]

of red, 72 amount 73 amount of green or

[46:04]

or 33 amount of blue, RGB. Now, why

[46:07]

these numbers? Well, in the context of

[46:09]

ASI or Unicode, which is just a

[46:11]

supererset thereof, what does this

[46:13]

spell?

[46:15]

>> Hi. But again, if you were instead to

[46:17]

open a file containing these three

[46:19]

numbers or really these three bytes of

[46:22]

bits in Photoshop, you would hope that

[46:25]

they're going to be interpreted not as

[46:27]

letters on the screen, but as some m uh

[46:30]

the the color of a dot on the screen

[46:32]

instead. So it turns out that in

[46:35]

typically when you have a three of these

[46:37]

numbers together each of them is using a

[46:39]

single bite. So eight bits. So you can

[46:42]

have zero red or 255 red. Zero green or

[46:46]

255 green or 0 to 255 of blue. So zero

[46:50]

is none, 255 is the max. So if we mix

[46:53]

these together, imagine that just like

[46:56]

that projector consolidating these three

[46:58]

colors into one central point. Anyone

[47:00]

want to guess what you're going to get

[47:02]

if you mix some red, some green, some

[47:03]

blue in those amounts in way back?

[47:08]

>> Yeah, you're going to get a dark shade

[47:09]

of yellow. I've brightened it up a

[47:11]

little bit for the projector here, but

[47:12]

you're going to get roughly this shade

[47:14]

of yellow. And we could play with these

[47:15]

numbers all day long and get similar

[47:17]

results if we want to represent

[47:19]

different colors as well. And indeed,

[47:21]

whether it's Photoshop or some other

[47:22]

program, you can actually combine these

[47:24]

amounts in all sorts of ratios to get

[47:27]

different colors. So if you had 0 0 0,

[47:29]

so no red, no green, no blue, take a

[47:31]

guess as to what color that's going to

[47:33]

be in the computer,

[47:34]

>> so it's going to be black, like the

[47:35]

absence of all three of those colors.

[47:37]

But if you mix the maximal amount of

[47:38]

each of those 255, red and green and

[47:41]

blue, that's going to give you white.

[47:43]

Now, if any of you have made web pages

[47:45]

before or use programs like Photoshop,

[47:47]

you might have seen numbers like 00 or

[47:50]

FF. Long story short, that's just

[47:52]

another base system for representing

[47:54]

numbers between 0ero and 255 as well.

[47:57]

But we'll come back to that mid-semester

[47:59]

when we make some of our own filters uh

[48:01]

in sort of an Instagram-like way,

[48:02]

manipulating images of our own. So,

[48:06]

where are these colors coming from or

[48:07]

where can we actually see them? Well,

[48:09]

here's just a picture of that same emoji

[48:10]

face with tears of joy. If I kind of

[48:12]

zoom in on that and maybe zoom in again,

[48:15]

you can start to see if you blow it up

[48:17]

enough or if you put your eyes close

[48:18]

enough to the device, sometimes you can

[48:20]

actually see individual dots or squares.

[48:23]

These are generally known as pixels. And

[48:26]

they're just the individual dots that

[48:27]

collectively compose an image. Which is

[48:30]

to say that if each of these dots, which

[48:33]

is part of the image, is going to be a

[48:34]

distinct color. Like this one's yellow,

[48:37]

this one's brown, and then there's a

[48:38]

bunch in between. Well, you're using

[48:40]

some number of bits to represent each of

[48:43]

those pixels colors. So, if you imagine

[48:46]

using the RGB system, that's 8 + 8 + 8

[48:50]

bit. So, that's 24 bits or three bytes

[48:54]

just to keep track of the color of each

[48:56]

and every one of these dots. So now, if

[48:59]

you think about having downloaded a GIF

[49:00]

at some point, a ping, PNG file, um a

[49:04]

JPEG or any other file format, it's

[49:06]

usually measured in what file size? like

[49:08]

megabytes typically that means millions

[49:10]

of bytes. Why? Because if it's a pretty

[49:12]

big photograph or pretty big image, each

[49:15]

of those dots takes up at least three

[49:17]

bytes it would seem. And if you do out

[49:19]

the math, if you got thousands of dots,

[49:21]

each of which uses three bytes, you're

[49:23]

going to quickly get to megabytes, if

[49:25]

not even larger for things like say

[49:27]

videos. But again, it's just patterns of

[49:29]

zeros and ones. And so long as the

[49:31]

programmer knows what they're doing and

[49:33]

tells the computer how to interpret

[49:35]

those zeros and ones. And equivalently,

[49:37]

so long as the software knows, look at

[49:39]

these zeros and ones and interpret them

[49:40]

as numbers or letters or colors, we

[49:44]

should see what we intended to

[49:46]

represent. All right, so that's num

[49:48]

that's uh colors and images. What about

[49:51]

how many of you kind of played with

[49:52]

these little flip books as a kid where

[49:54]

they've got like a hundred different

[49:55]

little pictures and you flip through

[49:56]

them really quickly and you see what

[49:58]

looks like animation in book form. Well,

[50:00]

this is essentially a video. So

[50:02]

therefore, what is a video or how can

[50:04]

you think of what a video is? It's just

[50:07]

a whole bunch of like images flying

[50:08]

across the screen either on paper or

[50:10]

digitally nowadays on your phone or your

[50:12]

laptop. And that's kind of nice because

[50:13]

we're sort of composing more interesting

[50:16]

media now based on these lower level

[50:18]

building blocks. And this is going to be

[50:19]

thematic. We literally started with

[50:20]

zeros and ones. We worked our way up to

[50:22]

letters. We then worked our way up to

[50:24]

sort of images and uh colors and thus

[50:27]

images. Now we're up at this level of

[50:29]

hierarchy in terms of video because

[50:31]

what's a video? It's like 30 images per

[50:34]

second flying across the screen or maybe

[50:37]

slightly fewer than that. That

[50:38]

collectively tricks our mind into

[50:40]

thinking we are seeing motion pictures.

[50:42]

And that's the old school term for

[50:44]

movies, but it literally is what it was.

[50:46]

motion pictures was this film was

[50:48]

showing you 30 pictures per second and

[50:50]

it looks like motion even though you're

[50:52]

just looking at images much like this

[50:54]

flip book very quickly one after the

[50:56]

other. What about music? Well, how could

[50:58]

you go about representing musical notes

[51:01]

if again your only ingredients are zeros

[51:05]

and ones? Even if you're not a musician,

[51:07]

how do you represent music like that on

[51:09]

the screen here? Yeah. Okay. So, the

[51:12]

frequency like the tone that you're

[51:13]

actually hearing from the device. What

[51:15]

else might weigh in beside besides the

[51:17]

frequency of the note? Yeah.

[51:20]

>> So the speed of the note or maybe the

[51:21]

duration like if you think about a

[51:23]

physical piano like how long you're

[51:24]

holding the key down for or not. What

[51:26]

else? So the amplitude maybe how loud

[51:29]

like how hard did you hit the keyboard

[51:31]

to generate that sound. So let me

[51:33]

propose at the risk of simplifying we

[51:35]

could represent each of these notes

[51:36]

using three numbers. maybe 0 to 255 or

[51:39]

some other range that represents the

[51:41]

frequency or the pitch of the note, the

[51:43]

duration, and the loudness. And so long

[51:46]

as the person receiving a file

[51:48]

containing all of those zeros and ones

[51:50]

knows how to interpret them three at a

[51:52]

time, I bet you could share uh a musical

[51:55]

file with someone else that they could

[51:57]

hear in exactly the same way that you

[52:00]

yourself intended. Let me pause here to

[52:04]

see if there's any questions now because

[52:06]

we've already built our way up from

[52:07]

zeros and ones now to video and sound.

[52:12]

>> Yeah, in front.

[52:13]

>> How does the computer know differentiate

[52:15]

between what the letter like 65 would be

[52:19]

and then what the number 65?

[52:20]

>> So, how does the computer distinguish

[52:22]

between the letter 65 and the number 65?

[52:24]

It's context dependent. So put simply

[52:27]

and we'll see this as early as next week

[52:29]

the programmer tells the computer how to

[52:31]

display the information either as a

[52:33]

number or a letter or equivalently once

[52:36]

programmed the software knows that when

[52:38]

it opens a GIF file or JPEG or something

[52:42]

else to interpret those zeros and ones

[52:45]

as colors instead of as like docx for a

[52:48]

Microsoft Word file or the like. Other

[52:51]

questions on any of these

[52:53]

representations?

[52:56]

Yeah. In front. Can we

[52:56]

>> go over like the base 10 base 2 thing

[52:59]

like really briefly?

[53:00]

>> Sure. So, can we go over base 10 and

[53:02]

base two? So, base 10 is like literally

[53:04]

the numbers you and I use every day.

[53:06]

It's base 10 in the sense that you have

[53:08]

10 digits at your disposal. 0 through 9.

[53:11]

And any numbers you want to represent in

[53:13]

the real world must be composed using 0

[53:15]

through 9. The binary system or base 2

[53:18]

is fundamentally the same. It's just the

[53:20]

computer doesn't have access to two

[53:22]

through 9. It only has access to zero

[53:24]

and one. But much like the light bulbs I

[53:26]

was displaying here, you can simply

[53:28]

ascribe different weights to each of the

[53:31]

digits. So that instead of it being as

[53:33]

much as the ones place, the 10's place,

[53:35]

and the hundred's place, if we more

[53:36]

modestly say the ones place, the two's

[53:38]

place, the four's place, we can use the

[53:40]

same system. In binary, you might need

[53:42]

to use more digits to count as high

[53:46]

because in 255, you can just write 255.

[53:49]

That's three digits in decimal. But in

[53:51]

binary, we've seen you need to use eight

[53:53]

such digits, which is more, but it's

[53:56]

still much better than unary, which

[53:57]

would have had 255 light bulbs on

[54:01]

instead.

[54:02]

>> And is

[54:04]

binary and like the same thing.

[54:06]

>> Is binary and base 2 the same thing?

[54:08]

Yes. Just like base 10 and decimal are

[54:11]

the same thing as well. And unary and

[54:13]

base 1 are the same thing as well. All

[54:15]

right. So let me just stipulate that

[54:18]

even though we sort of took this tour

[54:19]

quickly at the end of the day computers

[54:20]

only have zeros and ones at their

[54:22]

disposal. So again the answer to any

[54:23]

question as to how can we represent X is

[54:27]

going to somehow involve permuting those

[54:29]

zeros and ones into patterns or

[54:31]

equivalently into the numbers that they

[54:33]

represent. But if we now have a way to

[54:35]

represent all inputs in the world be it

[54:37]

letters, numbers, images, videos,

[54:39]

anything else and get output from some

[54:42]

problem-solving process like how do we

[54:44]

actually solve problems? Well, the

[54:45]

secret sauce in the middle here is

[54:46]

another term that you've probably heard

[54:47]

in the real world nowadays, which is

[54:49]

that of algorithm. Stepbystep

[54:52]

instructions for solving some problem.

[54:54]

So, this ultimately is what computer

[54:56]

science really is about too, is not just

[54:58]

representing information, but somehow

[55:00]

processing it, doing something

[55:01]

interesting with it to actually solve

[55:03]

the problem that you've been provided as

[55:05]

input so you can output the correct

[55:07]

answer. Now, there's all sorts of

[55:09]

algorithms implemented in our phones and

[55:11]

in our Macs and PCs, and that's all

[55:13]

software is. It's an implementation in

[55:15]

code, be it C++ or Java or anything

[55:19]

else. Other languages exist too in code

[55:22]

that the computer understands, but it's

[55:24]

still just step-by-step instructions.

[55:26]

And among the things we'll learn in CS50

[55:27]

is how to express yourself in different

[55:29]

ways to solve problems, not only in

[55:31]

different languages, but using different

[55:33]

methodologies as well. Because as we'll

[55:35]

see, among the reasons we introduce

[55:36]

these several languages is you don't

[55:38]

just learn more and more languages that

[55:40]

allow you to solve the same problems.

[55:42]

Different languages will allow you to

[55:44]

solve different problems and even save

[55:46]

you time by being better tools for the

[55:48]

job. So here for instance on uh an

[55:50]

iPhone is maybe a bunch of contacts

[55:52]

which is presumably familiar where we

[55:54]

might have a whole bunch of friends and

[55:56]

family and whatnot alphabetized by first

[55:58]

name or last name and suppose we want to

[55:59]

find one such person like John Harvard

[56:01]

whose number here might be plus1

[56:03]

949-4682750.

[56:05]

Feel free to call or text him sometime.

[56:07]

Um this is the goal of this problem. If

[56:10]

we have our contacts app and I start

[56:12]

typing in John's name by first name or

[56:14]

last name, the autocomplete nowadays

[56:16]

kicks in and it somehow filters the list

[56:19]

down from my 10 friends or 100 friends

[56:21]

or a thousand friends into just the

[56:22]

single directory entry that matches. So

[56:25]

here too, back in the days of RG&B um

[56:29]

projector, we had uh phone books like

[56:31]

this here too. Um I'm pleased to say

[56:33]

thanks to our friend Alexis, this is the

[56:35]

largest phone book that we've used for

[56:36]

this demonstration. Uh, this is an old

[56:38]

school phone book that's essentially the

[56:40]

same thing as our contacts app or

[56:41]

address book nowadays whereby I've got a

[56:43]

whole bunch of names and numbers

[56:46]

alphabetically sorted by first name or

[56:47]

last name, whatever, and corresponding

[56:49]

to each of those as a number. So, back

[56:51]

in the day and frankly even nowadays in

[56:53]

your phones, how do you go about finding

[56:55]

someone in a phone book or your contacts

[56:57]

app? Well, you could very naively just

[56:59]

start at the beginning and look down and

[57:01]

just turn one page at a time looking for

[57:04]

John Harvard in this case. Now, so long

[57:06]

as I'm paying attention, this

[57:07]

step-by-step process will get me to John

[57:11]

Harvard. Like, this is a correct

[57:12]

algorithm, even though you might kind of

[57:15]

object to how I'm doing this. Why? Like,

[57:18]

what's bad about this algorithm?

[57:21]

>> It's just slow. I mean, this is crazy

[57:22]

slow. If there's like a thousand pages

[57:24]

in this phone book, which looks like

[57:25]

there are, like this could take me as

[57:26]

many as a thousand pages, or maybe he's

[57:28]

roughly in the middle, like 500 pages.

[57:30]

Like, that's crazy. That's really rather

[57:32]

slow, especially if I'm going to do this

[57:33]

again and again. Well, what if I do it a

[57:35]

little smarter? Grade school, I sort of

[57:37]

learned how to count two at a time. So,

[57:38]

2 4 6 8 10 12 14 16 18. Again, if I'm

[57:44]

paying attention, I'll get there twice

[57:46]

as fast because I'm counting two at a

[57:48]

time. But is that algorithm step by step

[57:50]

correct?

[57:51]

And I'm seeing no, but why?

[57:55]

>> I might skip over John Harvard. So, just

[57:57]

by bad luck and kind of with 50/50

[57:59]

probability, he's going to be sandwiched

[58:01]

between two of the pages. Now, I don't

[58:03]

have to abort this algorithm alto

[58:04]

together. I could just as soon as I get

[58:06]

past the J section if we're doing it by

[58:08]

first name. I could just double back one

[58:10]

page and just make sure that I haven't

[58:12]

missed him. So, it's recoverable. And

[58:14]

this algorithm therefore is sort of

[58:15]

twice as fast plus one extra step maybe

[58:18]

to double back. But that's arguably

[58:20]

otherwise a bug or a mistake in the

[58:22]

algorithm if I don't fix it

[58:23]

intelligently. But what did we do back

[58:25]

in the day? And what does your iPhone or

[58:26]

Android phone do? What they typically do

[58:28]

is they go roughly to the middle, look

[58:31]

physically or virtually down. They see,

[58:33]

"Oh, I'm in the M section." And so,

[58:35]

which side is John Harbor to? To the

[58:37]

left or to the right? So, he's to the

[58:39]

left. So, I could literally now

[58:44]

Jesus Christ.

[58:47]

We talked about this before class that

[58:49]

this might be more Oh my god. There we

[58:52]

go. We can tear the problem in half.

[58:54]

Thank you. [applause]

[58:59]

It's been a while. We can tear the

[59:01]

problem in half. We know that John

[59:03]

Harvard is to the left. So, I can throw

[59:06]

half of the problem away if uh

[59:08]

dramatically such that I'm now gone from

[59:10]

a thousandpage problem to 500 pages

[59:13]

instead. What now can I do? I can go

[59:14]

roughly to the middle here and maybe I'm

[59:16]

in the E section. So, I went a little

[59:18]

too far back to the left, but I kept it

[59:19]

simple and I just divided so that I can

[59:21]

conquer this problem, if you will. And

[59:23]

if I'm in the E section now, is John

[59:24]

Harvard to the left or to the right? To

[59:26]

the right. So I can again Jesus Christ.

[59:32]

Tear the problem in half. And now, thank

[59:35]

you. So now John Harvard again is going

[59:38]

to be in this half. I can throw this

[59:39]

half away. So now I've gone from a,000

[59:41]

to 500 to 250. And I can repeat, repeat,

[59:43]

repeat down to 125. Half of that, half

[59:46]

of that, half of that until I'm left

[59:47]

with finally just a single page. And

[59:50]

John Harvard is hopefully now on this

[59:51]

page such that I can call him or not at

[59:54]

all at which point this is all sort of

[59:55]

for not. But what's powerful about each

[59:58]

of those algorithms is that the sort of

[60:00]

good better and best like they all get

[60:02]

the job done conditional on the second

[60:04]

one having that little fix just to make

[60:06]

sure I don't miss John Harbor between

[60:07]

two pages but they're fundamentally

[60:10]

different in their efficiency and the

[60:12]

quality of their design. And this is

[60:14]

really representative of one of the

[60:15]

emphases of a class like this. It's not

[60:17]

just about writing correct code or

[60:19]

getting the job done, but doing it well

[60:22]

and doing it quickly. Using the least

[60:24]

amount of CPU or computing resources,

[60:27]

using the minimal amount of RAM, using

[60:29]

the fewest number of people, using the

[60:31]

least amount of money, whatever your

[60:32]

constrained resource is, solving a

[60:35]

problem better. So that first algorithm

[60:37]

step-by-step instructions was all about

[60:40]

doing something like this whereby the

[60:43]

first algorithm if we plot things on a

[60:45]

grid like this we have on the x-axis a

[60:49]

representation of the size of the

[60:50]

problem. So this would mean small

[60:52]

problem like zero pages. This would mean

[60:54]

big problem like a thousand pages. And

[60:56]

on the y or vertical axis we have some

[60:58]

measurement of time. So this is the

[61:00]

number of seconds or the number of page

[61:01]

turns whatever your metric actually is.

[61:04]

So this would be uh not much time at

[61:06]

all, so fast. This would be a lot of

[61:08]

time, so slow. So what's the

[61:10]

relationship if we just roughly draw

[61:12]

these three algorithms? Well, the first

[61:13]

one is technically a straight line. And

[61:15]

we'll describe that as n. The slope is n

[61:17]

because if you think of n as a number

[61:19]

for the number of pages, well, there's a

[61:21]

one toone relationship in the first

[61:23]

algorithm as to how many times I have to

[61:25]

turn the page based on how many pages

[61:27]

there actually is. And you can think

[61:29]

about this in the extreme. If I was

[61:30]

looking for someone whose name started

[61:32]

with Z, I might have to go through like

[61:34]

a thousand darn pages to get to that

[61:36]

person whose name started with Z, unless

[61:38]

again I do something hackish and just

[61:40]

kind of cheat and go to the end. If we

[61:42]

execute these algorithms again and again

[61:43]

the same way, that's going to be pretty

[61:45]

slow. But the second algorithm was

[61:47]

pretty much twice as fast plus that one

[61:49]

extra step potentially. But it's still a

[61:51]

straight line because if there's a

[61:53]

thousand pages and I'm dividing the

[61:55]

problem and I'm doing two pages at a

[61:57]

time, well that's like n divided by two

[61:59]

steps plus one give or take. But it's

[62:01]

still a straight line because but it's

[62:04]

still better. Notice if this is the size

[62:06]

of the problem, a thousand pages for

[62:08]

instance, we'll notice that the first

[62:09]

algorithm took literally twice as much

[62:12]

time as the second algorithm. So we're

[62:14]

doing better already. But the third

[62:16]

algorithm fundamentally is going to look

[62:18]

something like this. And if you remember

[62:20]

your logarithm so to speak, sort of the

[62:22]

opposite of an exponential, this curve

[62:24]

is so much lower and flatter, if you

[62:27]

will, than either of these two

[62:29]

mathematically. More on this another

[62:30]

time. The slope is going to be like log

[62:32]

base 2 of n or just logarithmic in

[62:35]

nature. But what it means is that it's

[62:37]

growing very very very slowly. It's

[62:40]

still going up. It's never going to

[62:41]

flatline and go perfectly horizontal,

[62:43]

but it goes up very slowly. Why? Well,

[62:45]

if you think about two towns nearby,

[62:47]

like Cambridge on this side of the river

[62:48]

and the town of Alustin on the other,

[62:50]

suppose that they still have phone books

[62:52]

like this one, and they merge their

[62:54]

phone books for whatever reason. So,

[62:55]

overnight, we go from a thousandpage

[62:57]

phone book to a 2,000page phone book.

[63:00]

The first algorithm is going to take

[63:01]

literally twice as long as will the

[63:03]

second one because we're only going

[63:04]

through it one or two pages at a time.

[63:07]

But if the phone book size doubles from

[63:09]

this year, for instance, to next year,

[63:11]

you can kind of in your mind's eye think

[63:13]

about the green line. It's not going to

[63:15]

go up that much higher. Why? Well,

[63:18]

practically speaking, even if the phone

[63:20]

book becomes 2,000 pages long. Well, how

[63:24]

many more times do you have to tear or

[63:26]

divide that problem in half?

[63:29]

>> Just one. Because you're taking a,000

[63:31]

page bite out of it, or a 500 than a

[63:33]

250. you're taking much bigger bites out

[63:35]

of it than just one or two at a time.

[63:38]

And so what computer science and what

[63:40]

algorithms and about good design is

[63:42]

about is figuring out what is the logic

[63:44]

via which you can solve problems not

[63:46]

only correctly but efficiently as well.

[63:50]

And that then gives us these things

[63:51]

called algorithms. And when it comes

[63:53]

time to code, which we're about to do

[63:54]

too, code is just an implementation and

[63:57]

a language the computer understands of

[63:59]

an algorithm. Now this assumes that

[64:01]

we've come up with some digital way that

[64:03]

is to say zero in onebased way to

[64:05]

represent names and numbers. But

[64:08]

honestly we already did that. We came up

[64:09]

with a asky and then unicode to

[64:11]

represent the names. Representing

[64:13]

numbers is even easier than that. That's

[64:14]

really where we started. So code is just

[64:17]

about taking as input some standardized

[64:19]

representation of names and numbers and

[64:21]

spitting out answers. And that's truly

[64:23]

what iOS and Android are doing. When you

[64:25]

start doing autocomplete, they could be

[64:28]

searching from the top to the bottom,

[64:30]

which is fine if you've only got a few

[64:32]

friends and family in the phone. But if

[64:34]

you've got a thousand or if you've got

[64:35]

10,000 or if it's not a phone book

[64:38]

anymore, it's some database with lots

[64:39]

and lots of data. Well, it stands to

[64:41]

reason that it'd be nice maybe if the

[64:43]

computer kept it all alphabetized just

[64:45]

like that book and jumped to the middle,

[64:48]

then the middle of the middle, then the

[64:49]

middle of the middle of the middle, and

[64:50]

so forth. Why? because the speed is

[64:53]

going to be much much faster,

[64:55]

logarithmic in nature and not linear so

[64:58]

to speak in nature. But we'll revisit

[65:00]

those topics as well. But for now,

[65:02]

before we get into actual code, let's

[65:04]

talk for a moment about pseudo code. So

[65:07]

pseudo code is not one formal thing.

[65:09]

Every human will come up with their own

[65:10]

way of representing pseudo code. It's an

[65:13]

English-like or human-like formulation

[65:15]

of step-by-step instructions just using

[65:17]

tur correct English or whatever human

[65:20]

language. So, for instance, if I want to

[65:22]

translate what I did somewhat

[65:23]

intuitively with that phone book by just

[65:25]

dividing in half, dividing in half into

[65:27]

step-by-step instructions, I could hand

[65:29]

you or now it is like a robot or

[65:31]

something like that. Well, step one was

[65:33]

essentially to pick up the phone book,

[65:34]

which I did. Step two was I open to the

[65:37]

middle of the phone book in the third

[65:38]

and final algorithm. Step three was look

[65:40]

at the page as I did. Step four got a

[65:43]

little more interesting. Even though I

[65:44]

didn't verbalize this, presumably I was

[65:46]

asking myself a question. If the person

[65:48]

I'm looking for, John Harbert, is on the

[65:50]

page, then I would have called him right

[65:53]

then. But if he weren't on the page, if

[65:56]

he instead were earlier in the book, as

[65:59]

did happen, well then I'm going to go to

[66:01]

the left, so to speak, but more

[66:02]

methodically, I'm going to open to the

[66:04]

middle of the left half of the book.

[66:06]

Then I'm going to go back to line three.

[66:10]

That's interesting. We'll come back to

[66:11]

that in a moment. But else if the person

[66:12]

is later in the book, well, I'm going to

[66:14]

open to the middle of the right half of

[66:16]

the book and then go back to line three.

[66:20]

Now, let's pause here. Why do I keep

[66:22]

going back to line three? This would

[66:24]

seem to get me doing the same thing

[66:26]

forever endlessly.

[66:29]

But not quite. Why?

[66:31]

>> As soon as you hit the one the on.

[66:34]

>> Yeah. So because I am dividing the

[66:38]

problem in half, for instance, on line

[66:40]

six or line nine implicitly just based

[66:42]

on how I've written this, the problem's

[66:44]

getting smaller and smaller and smaller.

[66:46]

So it's fine if I keep doing the same

[66:47]

logic again and again because if the

[66:49]

problem's getting smaller, eventually

[66:50]

it's going to bottom out and I'm going

[66:52]

to have just one person on that page

[66:54]

that I want to call and so the algorithm

[66:56]

is done. But there is a perverse corner

[66:59]

case, if you will, and this is where

[67:00]

it's ever more important to be precise

[67:02]

when writing code and anticipate what

[67:04]

could go wrong. I should probably ask

[67:07]

one more question in this code, not just

[67:09]

these three. What might that question

[67:14]

be? Yeah.

[67:16]

>> John Harvard is in the book.

[67:18]

>> Yeah. So, if John Harvard is not in the

[67:19]

book, there's this corner case where

[67:21]

what if I'm just wasting my time

[67:22]

entirely and I get to the end of the

[67:24]

phone book and John Harvard's not there.

[67:25]

What should the computer do? Well, as an

[67:27]

aside, if you've ever been using your

[67:28]

Mac or PC or phone and the thing just

[67:30]

freezes or like the stupid little beach

[67:32]

ball starts spinning or something like

[67:33]

that and you're like, what is going on?

[67:35]

Some human at Google or Microsoft or

[67:37]

Apple or the like made a mistake. They

[67:40]

forgot for instance that fourth uncommon

[67:43]

but possible situation wherein if they

[67:45]

don't tell the computer how to handle

[67:46]

it, the computer's effectively going to

[67:48]

freak out and do something undefined

[67:51]

like just hang or reboot or do something

[67:53]

else. So we do want to add this else

[67:56]

quit altogether. So you have welldefined

[67:59]

behavior and truly think that the next

[68:01]

time your computer or phone

[68:02]

spontaneously reboots or dies or does

[68:05]

something wrong, it's probably not your

[68:07]

fault per se. It's some other human

[68:09]

elsewhere did not write correct code.

[68:11]

They didn't anticipate cases like these.

[68:14]

But now let's use some terminology here.

[68:16]

There's some salient ideas that we're

[68:17]

going to see in Scratch and C and Python

[68:20]

and these other languages I alluded to

[68:21]

earlier. Everything I've just

[68:23]

highlighted here, henceforth, we're

[68:25]

going to think of as functions.

[68:26]

Functions are verbs or actions that

[68:28]

really get some small piece of work done

[68:31]

for you. Functions are verbs or actions.

[68:34]

Here though, highlighted is the

[68:35]

beginning of what we'll call

[68:36]

conditionals. Conditional is like a fork

[68:38]

in the road. Do I go this way? Do I go

[68:40]

this way? Or some other way altogether.

[68:42]

How do you decide what road to go down?

[68:45]

We're going to call these questions you

[68:47]

ask yourself boolean expressions. Named

[68:50]

after a mathematician Bull. And a

[68:51]

boolean expression is just a question

[68:53]

that has a yes or no answer or a true or

[68:56]

false answer or a one or zero answer

[68:59]

just it's a binary state yes or no

[69:02]

typically. Otherwise we have this go

[69:04]

back to go back to which is what we're

[69:06]

generally going to call a loop which

[69:08]

somehow induces cyclical behavior again

[69:11]

and again. And those functions and those

[69:13]

conditionals, boolean expressions and

[69:15]

loops and a few other concepts are

[69:16]

pretty much what will underly all of the

[69:18]

code that we write whether it is in

[69:21]

scratch C or something else altogether.

[69:24]

But we need to get to that point and in

[69:26]

fact let's go and infer what this

[69:29]

program here does. At the end of the

[69:31]

day, computers only understand zeros and

[69:33]

ones. So I claim here is a program of

[69:35]

zeros and ones. What does it do?

[69:39]

Anyone

[69:41]

want to guess? I mean, we could spend

[69:42]

all day converting all of these zeros

[69:44]

and ones to numbers, but they're not

[69:46]

going to be numbers if it's code. What

[69:47]

do you think?

[69:49]

>> That's amazing. It does in fact print

[69:53]

hello world.

[69:55]

All right. So, no one except like maybe

[69:57]

you and me and a few others in the room

[69:58]

should know, and that was probably guess

[70:00]

admittedly or advancing on the slide.

[70:02]

But why is that? Well, it turns out that

[70:03]

not only do computers standardize

[70:05]

information, data like numbers and

[70:08]

letters and colors and other things,

[70:09]

they also standardize instructions. And

[70:11]

so, if you've heard of companies like

[70:13]

Intel or AMD or Nvidia or others, among

[70:17]

the things they do is they decide as a

[70:18]

company what pattern of zeros and ones

[70:21]

shall represent what functionality. And

[70:23]

it's very low-level functionality. those

[70:25]

companies and others decide that some

[70:27]

pattern of zeros and ones means add two

[70:30]

numbers together or subtract or

[70:32]

multiply. Another pattern might mean

[70:34]

load information from the computer's

[70:35]

hard drive into memory. Another might

[70:37]

mean store it somewhere else. Another

[70:40]

might mean print something out to the

[70:42]

screen. So nested somewhere in here and

[70:44]

admittedly I have no idea which pattern

[70:45]

off because it's not interesting enough

[70:47]

to go figure it out at this level says

[70:50]

print. And somewhere in there, like this

[70:52]

gentleman proposed, I bet we could find

[70:54]

the representation of H, which was 72

[70:58]

and E and L and L and O and everything

[71:01]

that composes hello world. Because, as

[71:02]

it turns out in programming circles, the

[71:04]

very first program that students

[71:06]

typically write is that of hello world.

[71:09]

Now, this one here is written in a much

[71:12]

more intelligible way. Even if you're

[71:14]

not a programmer, odds are if I asked

[71:15]

you, what does this program do? you

[71:17]

would have said,

[71:19]

"Oh, hello world." Even though there's a

[71:21]

lot of clutter here, like no idea what

[71:22]

this is until next week. Int main void.

[71:24]

That looks cryptic. There's these weird

[71:26]

curly braces, which we rarely use in the

[71:28]

real world, but at least I understand a

[71:29]

few words like hello in world. And this

[71:32]

is kind of familiar. Print f, but it's

[71:34]

not print, but it's probably the same

[71:35]

thing. So, here too is an example of

[71:37]

this hierarchy. Back in the day, in the

[71:40]

earliest days of computers, humans were

[71:42]

writing code by representing zeros and

[71:45]

ones. If you've ever heard your parents

[71:46]

talk about punch cards or the like,

[71:47]

you're effectively representing patterns

[71:49]

that tell the computer what to do or

[71:51]

what to represent, like literally holes

[71:53]

in paper. Well, pretty quickly early on

[71:55]

this got really tedious, only writing

[71:57]

code at such a low level. So, someone

[71:59]

decided, you know what, I'm going to put

[72:00]

in the effort. I'm going to figure out

[72:02]

what patterns of zeros and ones I can

[72:04]

put together so as to be able to convert

[72:07]

something more user friendly to those

[72:10]

zeros and ones. And as a teaser for next

[72:12]

week, that person invented the first

[72:14]

compiler. A compiler is just a program

[72:16]

that translates one language to another.

[72:18]

And more modernly, this is a language

[72:20]

called C, which we'll spend a few weeks

[72:21]

on together because it's so fundamental

[72:24]

to how the computer works. Even this is

[72:26]

going to get tedious by like week six of

[72:28]

the class. And this is going to get

[72:29]

stupid. This is going to get annoying.

[72:30]

This is going to get cryptic. We're just

[72:32]

going to write print hello on the screen

[72:35]

in order to use a different language

[72:36]

called Python. Why? because someone

[72:38]

wrote in C a program that can convert

[72:42]

Python, this is a white lie, to C which

[72:45]

can then be converted to zeros and ones

[72:47]

and so forth. So in computing there's

[72:49]

this principle of abstraction where we

[72:51]

start with the basics and thank god we

[72:53]

can all trust that someone else solved

[72:54]

these really hard problems or way uh

[72:57]

long ago. Then they wrote programs to

[72:59]

make it easier. We wrote programs to

[73:01]

make it easier. You can now write code

[73:02]

like I did with the chatbot to make

[73:04]

things even easier. Why? because OpenAI

[73:06]

and other companies have abstracted away

[73:08]

a lot of the lower level implementation

[73:10]

details. And that's where I think this

[73:12]

stuff gets really exciting. We can stand

[73:14]

on the shoulders of others so long as we

[73:15]

know how to use and assemble these kinds

[73:18]

of building blocks. And speaking of

[73:20]

building blocks, let's start here. Now,

[73:22]

odds are some of you might have started

[73:23]

here in like grade school playing with

[73:25]

Scratch. And it's great for like after

[73:26]

school programs, learning how to

[73:28]

program. And you probably used it this

[73:30]

language to make games and graphics and

[73:32]

just maybe playful art or the like. But

[73:34]

in Scratch, which is a graphical

[73:36]

programming language designed about 20

[73:38]

years ago from our friends down the road

[73:39]

at MIT's Media Lab, it represents pretty

[73:42]

much everything we're going to be doing

[73:44]

fundamentally over the next several

[73:46]

weeks in more modern languages like C

[73:49]

and Python, more textual languages, if

[73:51]

you will. I bet I could ask the group

[73:53]

here, what does this program do when you

[73:55]

click a green flag? Well, it says hello

[73:58]

world on the screen. Because with

[74:00]

Scratch, you have the ability to express

[74:02]

yourself with functions and loops and

[74:04]

conditionals and all of this, but by

[74:06]

using drag and drop puzzle pieces. So,

[74:09]

what we're about to do is this. We're

[74:10]

going to go on my screen to

[74:11]

scratch.mmit.edu.

[74:13]

It's a browserbased programming

[74:14]

environment, and we're only going to

[74:15]

spend one week, really a few days in

[74:18]

CS50 on this language. But the

[74:20]

overarching goal is to one make sure

[74:22]

everyone's comfortable applying some of

[74:24]

these building blocks and actually

[74:25]

developing something that's interesting

[74:26]

and visual and audio as well, but to

[74:29]

also give us some visuals that we can

[74:31]

rely on and fall back on when all of

[74:33]

those curly braces and parentheses and

[74:35]

sort of stupid syntax comes back that's

[74:38]

necessary in many languages but can very

[74:40]

quickly become a distraction early on

[74:42]

from the interesting and useful ideas.

[74:45]

So what we're about to see is this in a

[74:47]

browser. This is the Scratch programming

[74:49]

environment and there's a few different

[74:50]

parts of this world. This is the blocks

[74:52]

pallet so to speak. That is to say,

[74:54]

there's a bunch of puzzle pieces or

[74:55]

building blocks that represent functions

[74:58]

and conditionals and v and uh loops and

[75:01]

other such constructs. There's going to

[75:03]

be the programming area here where you

[75:04]

can actually write your code by dragging

[75:06]

and dropping these puzzle pieces.

[75:08]

There's a whole world of sprites here.

[75:10]

By default, Scratch is uh and is a cat

[75:13]

by design, but you can make Scratch look

[75:15]

like a dog, a bird, a garbage can, or

[75:17]

anything else as we'll soon see. And

[75:19]

then this is the world in which Scratch

[75:21]

itself lives. So Scratch can go up,

[75:23]

down, left, right, and generally be

[75:25]

animated within that world. For the

[75:26]

curious, kind of like high school

[75:28]

geometry class, there's sort of this XY

[75:30]

plane here. So 0 0 would be in the

[75:32]

middle. 0 180 is here. 0 comma 180 is

[75:36]

here. Uh -240 is here. and positive 240

[75:40]

0. Generally, you don't need to worry

[75:42]

about the numbers, but they exist. So

[75:44]

that when you say up or down, you can

[75:46]

actually tell the program go up one

[75:48]

pixel or 10 pixels or 100 pixels so that

[75:51]

you have some definition of what this

[75:52]

world actually is. All right, so let's

[75:56]

actually put this to the test. Let me go

[75:58]

ahead here and flip over to in just a

[76:00]

moment the actual Scratch website

[76:04]

whereby I'm going to have on my screen

[76:06]

in just a moment that same user

[76:08]

interface once I've logged in that via

[76:11]

which I can actually write some code of

[76:13]

my own. Let me go ahead and zoom in on

[76:15]

the screen a little bit here and let's

[76:17]

make the simplest of these programs

[76:18]

first. Maybe a program that simply says

[76:20]

hello world. Now at a glance it's kind

[76:22]

of overwhelming how many puzzle pieces

[76:24]

there are. And honestly, even over 20

[76:25]

years, I've never used them all. And MIT

[76:27]

occasionally adds to it. But the point

[76:28]

is that they're colorcoded to resemble

[76:31]

the type of functionality that they

[76:33]

offer. And also, it's meant to be the

[76:35]

sort of thing where you can just kind of

[76:36]

scroll through and get a visual sense of

[76:38]

like what you could do and then figure

[76:40]

out how you might assemble these puzzle

[76:41]

pieces together. So, I'm going to go

[76:43]

under this yellow or orangish category

[76:46]

here to begin with. So, there exists in

[76:48]

the world of Scratch not quite the same

[76:50]

jargon that I'm using now. functions and

[76:53]

conditionals and loops. That's more of

[76:55]

the programmer's way. This is more of

[76:56]

the child-friendly way, but it's really

[76:58]

the same idea. Under events, you have

[77:01]

puzzle pieces that represent things that

[77:02]

can happen while the world is running.

[77:06]

So, for instance, the first one here is

[77:07]

sort of the canonical when the green

[77:09]

flag is clicked. Why is that relevant?

[77:11]

Well, in the two-dimensional world that

[77:13]

Scratch lives in, there's a stop sign,

[77:14]

which means stop, and there's a green

[77:16]

flag, which means go. So, I can

[77:18]

therefore drag one of these puzzle

[77:20]

pieces over here so that when I click

[77:22]

that green flag, the cat will in fact do

[77:24]

something for me. Doesn't really matter

[77:26]

where I drop it, so long as it's

[77:27]

somewhere in the middle here. I'm going

[77:29]

to go ahead and let go. Now, I want the

[77:31]

look of the cat to change. I want to see

[77:33]

like a cartoon speech bubble come out

[77:35]

for now. So, I'm going to go under looks

[77:37]

here. And there's a bunch of different

[77:39]

ways to say things and think things. I'm

[77:41]

going to keep it simple and just drag

[77:42]

this one here. And now notice when I get

[77:44]

close enough to that first puzzle piece,

[77:47]

they're sort of magnetic and they want

[77:48]

to snap together. So I can just let go

[77:50]

and boom, because they're a similar

[77:52]

shape, they will lock together

[77:53]

automatically. And notice too, if I zoom

[77:55]

in here, the white oval, which by

[77:58]

default says hello, is actually editable

[78:00]

by me because it turns out that some

[78:03]

functions can take arguments or more

[78:06]

generally inputs that influence their

[78:08]

behavior. So, if I kind of click or

[78:09]

double click on this, I can change it to

[78:11]

the more canonical hello world or hello

[78:13]

David or hello whatever I want the

[78:15]

message to be. I'm going to go ahead and

[78:17]

zoom out. And now over here at top

[78:19]

right, notice that I can very simply

[78:21]

click the green flag. And I'll have

[78:23]

written my first program in Scratch. I

[78:26]

clicked the green flag, it said go. And

[78:28]

now notice it's sort of stuck on that

[78:30]

because I never said stop saying go. But

[78:32]

that's where I can click the red stop

[78:34]

sign and sort of get the cat back to

[78:36]

where I want it. So think about for just

[78:38]

a moment what it is we just did. So at

[78:40]

the one hand we have a very obvious

[78:43]

puzzle piece that says say and it said

[78:45]

something but it really is a function

[78:46]

and that function does take an input

[78:49]

represented by the white oval here

[78:51]

otherwise known as an argument or a

[78:53]

parameter. But what this really is is

[78:55]

just an input to the function. And so we

[78:57]

can map even this simple simple scratch

[79:00]

program onto our model of problem

[79:02]

solving before with an addition of what

[79:04]

we'll call moving forward a side effect.

[79:06]

A side effect in a computer program is

[79:09]

often something that happens visually on

[79:11]

the screen or maybe audibly out of a

[79:14]

speaker. It's something that just kind

[79:15]

of happens as a result of you using a

[79:17]

function like a speech bubble appearing

[79:19]

on the screen. So here more generally is

[79:21]

what we claimed it represents the

[79:23]

solving of a problem. And let's just

[79:25]

consider what the input is. The input to

[79:27]

this problem say something on the screen

[79:30]

is this white oval here that I typed in.

[79:32]

Hello world. The algorithm, the

[79:34]

step-by-step instructions are not

[79:36]

something really I wrote like our

[79:37]

friends at MIT implemented that purple

[79:40]

say block. So someone there knows how to

[79:43]

get the cat to say something out of its

[79:45]

uh comical mouth. So the algorithm

[79:47]

implemented in code is really equivalent

[79:50]

to the say function. So a function is

[79:52]

just a piece of functionality

[79:54]

implemented in code which in turn

[79:57]

implements an algorithm. So algorithm is

[79:58]

sort of the concept and the function is

[80:00]

actually the incarnation of it in code.

[80:03]

What's the output? Well, hopefully it's

[80:04]

this side effect seeing the speech

[80:06]

bubble come out of the cat's mouth like

[80:09]

this. All right, so that's one such

[80:12]

program, but it's always going to play

[80:14]

and look the same. What if I actually

[80:16]

want to prompt the human for their

[80:18]

actual name? Well, let me go back to the

[80:21]

puzzle pieces here. Let me go ahead and

[80:23]

throw this whole thing away. Okay. And

[80:24]

if you want to delete blocks, you can

[80:25]

either rightclick or control-click and

[80:27]

choose from a menu. Or you can just drag

[80:28]

them there and sort of let go and

[80:30]

they'll disappear. I'm going to go back

[80:32]

in and get another uh another event

[80:34]

block, even though I could have reused

[80:36]

that same one. I'm going to go ahead and

[80:38]

go under sensing now. And if I zoom in

[80:40]

over here, you'll see a whole bunch of

[80:42]

things like I can sense distance and

[80:44]

colors. But more pragmatically, I can

[80:46]

use this function in blue, ask

[80:49]

something, and then wait for the answer.

[80:51]

And what's different about this puzzle

[80:53]

piece is that it too is yes a function.

[80:55]

It too takes an argument, but instead of

[80:58]

having an immediate side effect like

[80:59]

displaying something on the screen, it's

[81:02]

essentially inside of the computer going

[81:04]

to hand me back the response. It's going

[81:07]

to return a value, so to speak. And a

[81:10]

return value is something that the code

[81:12]

can see, but the human can't. A side

[81:14]

effect is something the human sees, but

[81:15]

a return value is something only the

[81:17]

computer sees. It's like the computer is

[81:19]

handing me back the user's input. So,

[81:21]

how does this work? We'll notice, and

[81:23]

this is a bit strange. This isn't

[81:24]

usually how variables work, but Scratch

[81:26]

2 supports variables, and that was a

[81:28]

word I used quickly at the very start

[81:30]

when we were making the chatbot. A

[81:32]

variable like in math, X, Y, or Z, just

[81:34]

store some value, but it doesn't have to

[81:36]

store a number. In code, it can store

[81:38]

like a human name. So, what's going to

[81:40]

happen when I use this puzzle piece is

[81:42]

that once the human types in their name

[81:43]

and hits enter, MIT, or really Scratch

[81:46]

is going to store the answer, the

[81:48]

so-called return value in a variable

[81:50]

that's designed to be called answer.

[81:52]

But, as we'll see, you can make your own

[81:55]

variables down the line if you want and

[81:57]

call them anything you want. But, let me

[81:59]

go ahead and zoom out. Let me drag this

[82:01]

over here. I'm going to use the default

[82:02]

question, what's your name? But I could

[82:04]

certainly change the text there. And let

[82:06]

me go under looks again. Let me go ahead

[82:09]

and grab the say block and let me go

[82:12]

ahead and say just for consistency like

[82:14]

hello,

[82:16]

okay? And now let me go under maybe

[82:19]

sensing I want to say how do I want to

[82:21]

say this answer. Well, notice this. The

[82:23]

shapes are important. This too is an

[82:25]

oval even though it's not white but

[82:26]

that's just because it's not editable.

[82:27]

It's going to be handed to me by the ask

[82:30]

function. Let me zoom out and grab a

[82:32]

second say block like this. And notice

[82:36]

it will magnetically clip together. I

[82:38]

don't want to say hello again. So, I

[82:40]

could delete that. But now it's still

[82:42]

the same shape even though it's a little

[82:43]

smaller. Let me go back to sensing. And

[82:44]

notice what can happen here. When you

[82:46]

have values like words inside of a

[82:49]

so-called variable, you can use those

[82:51]

instead of manual input at your

[82:53]

keyboard. And notice it too wants to

[82:54]

magnetically snap into place. It'll grow

[82:57]

to fit that variable because the shape

[82:58]

is the same. And now let's do this. Let

[83:01]

me click the green flag at right. I'm

[83:03]

seeing quote unquote what's your name?

[83:05]

I'm getting a text box this time, like

[83:06]

on a web page for instance. Let me type

[83:09]

in my name and watch closely what comes

[83:10]

out of the cat's mouth as soon as I

[83:12]

click the check mark or hit enter.

[83:16]

Huh. Okay, I got my name right, but let

[83:18]

me do it once more. Let me stop and

[83:19]

start davvid.

[83:22]

Enter. No, it didn't work. Let me try

[83:25]

one other. Maybe it's my name. Let's try

[83:27]

Kelly. Enter. What's missing? Obviously,

[83:33]

the the hello. There's a bug, a mistake

[83:35]

in this program. But is there like what

[83:37]

explains this? Even if you've never

[83:38]

programmed before, intuitively, what

[83:40]

could explain why I'm not seeing hello?

[83:46]

>> Exactly. It's on two different lines.

[83:47]

So, it's doing one after the other. So,

[83:49]

it is happening. It's just you and I is

[83:51]

the slowest things in the room are just

[83:53]

not seeing it in time because it's

[83:55]

happening so darn fast. Because my

[83:57]

computer is so, you know, so new and so

[83:59]

fast, it's happening, but way too

[84:00]

quickly. So, how can we solve this? So

[84:02]

we can solve this in a few different

[84:04]

ways. And this is where in Scratch at

[84:06]

least for problems at zero when wherein

[84:07]

you'll have an opportunity to play

[84:09]

around with this. I can scroll around

[84:10]

here and okay under control I see

[84:13]

something like weight. So I can just

[84:14]

kind of slow things down. And now notice

[84:17]

too if you hover over the middle of two

[84:19]

blocks if it's the right shape it'll

[84:20]

just snap into the middle too. Or you

[84:23]

can just so you know kind of drag things

[84:25]

away to magnetically separate them. But

[84:27]

this might solve this. So let me hit

[84:28]

stop and then start davvid. Enter.

[84:33]

Hello, David. All right, that was a

[84:35]

little Let's do like maybe two seconds

[84:37]

to see it again. Green flag dab ID.

[84:40]

Enter. Hello,

[84:42]

David. All right, it's working better.

[84:45]

It's sort of more correct because I'm

[84:46]

seeing the hello and the David, but kind

[84:49]

of stupid, right, to see one and then

[84:51]

the other. Wouldn't it be nice to say it

[84:52]

all in one breath, so to speak? Well,

[84:54]

here's where we can maybe compose some

[84:56]

ideas. So, let me get rid of this weight

[84:58]

and the additional block. Let's confine

[84:59]

ourselves to just one say block. But let

[85:02]

me go down to operations where we

[85:04]

haven't been before. And this is

[85:06]

interesting. There's this bigger oval

[85:08]

here that says join two things like

[85:10]

apple and banana. And those are just

[85:12]

random placeholder words that you can

[85:13]

override with anything you want. But

[85:15]

they're both ovals and white, which

[85:16]

means I can edit them. So let me go

[85:18]

ahead and do this. Let me drag this on

[85:20]

top of the say block. And this is just

[85:22]

going to therefore uh override the hello

[85:25]

I put there. Now I don't want to say

[85:27]

apple or banana, but I do want to say

[85:29]

hello,

[85:31]

and I then want to say my name. Okay, so

[85:33]

now I can go back to sensing, go back to

[85:35]

answer, drag and drop this here. That'll

[85:39]

snap into place. And let me zoom in. Now

[85:41]

what I've done is take a function and on

[85:43]

top of it I've nested another function,

[85:45]

the join function that takes two

[85:47]

arguments or inputs and presumably joins

[85:50]

them together as per its name. So let's

[85:52]

see what this does for us. Let me click

[85:54]

stop and start. I'll type in David

[85:57]

enter. And it's so close. Now, this is

[86:00]

just kind of an aesthetic bug. What have

[86:01]

I done wrong here?

[86:04]

There's no space. So, it looks a little

[86:05]

wrong, but that's an easy fix. I just

[86:07]

need to literally go into the hello

[86:09]

block after the comma, hit the space

[86:11]

bar, so that now when I stop and start

[86:13]

again and type in David, now I see

[86:15]

something that's closer to the grammar

[86:17]

we might typically expect syntactically

[86:19]

here. All right. So, let's model this

[86:22]

after what we just saw earlier. We've

[86:24]

now introduced a so-called return value.

[86:27]

And this return value is something we

[86:29]

can then use in the way we want. It's

[86:31]

not happening immediately like the

[86:32]

speech bubble. It's clearly being passed

[86:34]

to me in some way that I can use to plug

[86:37]

in somewhere else like into that join

[86:38]

block. So if we consider the role of

[86:40]

these variables playing, let's consider

[86:42]

the picture now as follows. If the input

[86:44]

now to the first function, the ask block

[86:47]

is what's your name? Quote unquote,

[86:49]

that's indeed being fed into the ask

[86:51]

block. And the result this time is not a

[86:53]

speech bubble. It's not some immediate

[86:54]

visual side effect. It is the answer

[86:58]

itself stored in a so-called variable as

[87:01]

represented by this blue oval.

[87:03]

Meanwhile, what I want to do is combine

[87:06]

that answer with some text I came up

[87:08]

with in advance by kind of stacking

[87:10]

these things together. Now, visually in

[87:12]

Scratch, you're stacking them on top,

[87:13]

but it's really that you're passing one

[87:15]

into the other into the other because

[87:17]

much like math when you have the

[87:19]

parenthesis and you're supposed to do

[87:20]

what's inside the parenthesis and then

[87:21]

work your way out. Same idea here. You

[87:24]

want to join hello and answer together.

[87:26]

And whatever that output is, that then

[87:28]

becomes the input to the say block,

[87:30]

which like in math is outside of the

[87:33]

join block itself. So pictorially, it

[87:35]

might now look like this. There's two

[87:37]

inputs to this story. Hello, comma,

[87:39]

space, and the answer variable. The

[87:42]

puzzle piece in question is join. Its

[87:44]

goal in life had better be to give me

[87:46]

the full phrase that I want. Hello,

[87:48]

David. Let's shift everything over now

[87:50]

because that output is about to become

[87:52]

the input to the say block which itself

[87:55]

will now have the so-called side effect.

[87:58]

And so this too is what programming and

[88:01]

in turn what computer science is about

[88:02]

is composing with the solutions to

[88:05]

smaller problems solutions to bigger

[88:08]

problems using those component pieces.

[88:10]

And that's what each of these puzzle

[88:11]

pieces represents is a smaller problem

[88:13]

that someone else or maybe even you has

[88:16]

already solved. Now, we can kind of

[88:17]

spice things up here. If I go back to

[88:20]

Scratch's interface, we don't have to

[88:21]

use just the puzzle piece here. I can do

[88:23]

something like this. Let me go ahead and

[88:24]

drag these apart and get rid of the say

[88:26]

block down here. Just for fun, there's

[88:28]

all these extensions that you can add

[88:29]

over the internet to your own Scratch

[88:32]

environment. And if I go to like text to

[88:34]

speech down here, I can, for instance,

[88:36]

do uh a speak block instead of a say

[88:39]

block colored here in green. I can now

[88:41]

reconnect the join block in here. And if

[88:44]

we could raise the volume just a little

[88:45]

bit. Let me stop the old version, start

[88:47]

the new version, type in my name, and

[88:49]

hear what Scratch actually sounds like.

[88:53]

>> Hello, David.

[88:54]

>> Okay, not very cat-like, but we can kind

[88:56]

of waste some time on this by like

[88:58]

dragging the set voice to box. And I can

[89:00]

put this anywhere I want above the speak

[89:04]

block. So, I'm just going to put it

[89:05]

here, even though I've already asked a

[89:06]

question. Maybe kitten sounds

[89:08]

appropriate. Let's try again. Dav

[89:11]

>> meow meow.

[89:13]

>> Okay. And then let's see uh giant little

[89:17]

creepier. Here we go. DAV ID. And

[89:20]

lastly,

[89:21]

>> hello David.

[89:23]

>> All right. Little ransomlike instead.

[89:25]

All right. So, that's just some

[89:26]

additional puzzle pieces, but really

[89:27]

just the same idea, but I like that

[89:29]

we've introduced some sound. So, let's

[89:30]

do this. Let me go ahead and throw away

[89:32]

a lot of those puzzle pieces, leave

[89:33]

ourselves with just the when green flag

[89:35]

clicked, and play around with some other

[89:37]

building blocks that we've seen already

[89:39]

thus far. Let me go ahead, for instance,

[89:41]

under sound, and let's make the cow

[89:43]

actually meow. So, it turns out Scratch

[89:45]

being a cat by default comes with some

[89:47]

sounds by default like meowing. So, if

[89:49]

we go ahead and click the green flag

[89:51]

after programming this program, let's

[89:54]

hear what he sounds like now.

[89:56]

Okay, kind of cute. And if you want it

[89:58]

scratched to meow twice, you can just

[90:00]

play the game again.

[90:03]

And a third time. All right, but that's

[90:05]

going to get a little tedious as cute as

[90:07]

it is. So, I can solve that. Let's just

[90:09]

grab three of the puzzle pieces and just

[90:11]

drag them together and let them connect.

[90:12]

And now click the green flag.

[90:16]

All right. Doesn't it gets less cute

[90:18]

quickly, but maybe we can slow it down

[90:20]

so that the cat doesn't sound so so

[90:21]

hungry. Maybe let me go under uh let's

[90:24]

see under control. Let's grab one of

[90:26]

those. Wait one second and maybe plop a

[90:29]

couple of these in the middle here. That

[90:30]

might help things. And now click the

[90:32]

green flag.

[90:35]

Okay. Still a little hungry, but let's

[90:37]

see if we change it to two. And then I

[90:39]

change it to two down here in both

[90:41]

places. Let's play it again.

[90:46]

Okay, cuter maybe, but now I'm venturing

[90:49]

into badly programmed territory. This is

[90:52]

correct. If my goal is to get the cat to

[90:53]

meow three times, pausing in between.

[90:56]

Sorry, three times pausing in between.

[90:58]

What is bad about this code? Even if

[91:02]

you've never programmed before, though.

[91:04]

Yeah, in the middle.

[91:07]

>> Yeah, I literally had to repeat myself

[91:08]

three times. Essentially copy pasting.

[91:10]

And frankly, I could have been really

[91:11]

lazy and I could rightclick or

[91:13]

control-click and I could have chosen

[91:14]

duplicate. But generally, when you copy

[91:17]

paste code or when you duplicate puzzle

[91:19]

pieces, probably doing something wrong.

[91:22]

Why? It's solving the problem correctly,

[91:25]

but it's not well designed. Even if for

[91:27]

only because when I change the number of

[91:29]

seconds, now I had to change it in two

[91:30]

places. So, I had one initially, then I

[91:33]

had to change it to two. And if you just

[91:35]

imagine in your mind's eye having not

[91:36]

like six puzzle pieces but 60 or 600 or

[91:40]

6,000, you're going to screw up

[91:42]

eventually if it's on you to remember to

[91:43]

change something here and here and here

[91:45]

and here. Like you're going to mess up.

[91:47]

It's better to keep things simple and

[91:49]

ideally centralized by factoring out

[91:52]

common functionality. And clearly

[91:54]

playing sound and waiting is something

[91:56]

I'm doing at least twice if not a third

[91:58]

time here as well. So how can we do this

[92:00]

better? Well, remember this thing loops.

[92:03]

Maybe we can just do something a little

[92:04]

more cycllically. So I tell the computer

[92:06]

to do something once, but I tell it how

[92:08]

many times to do that al together. So

[92:10]

notice here by coincidence under control

[92:12]

I have a repeat block which doesn't say

[92:15]

loop, but that's certainly the right

[92:16]

semantics. Let me go ahead and drag the

[92:18]

repeat block in and I'll change the 10

[92:20]

to three just for consistency here. I'm

[92:22]

going to go back to sound. I'm going to

[92:23]

go ahead and play sound meow until done

[92:26]

just as before. And just so it's not

[92:28]

meowing too fast under control, I'm

[92:30]

going to grab a weight one second and

[92:32]

keep it inside the loop. And notice that

[92:34]

the loop here is sort of hugging these

[92:36]

puzzle pieces by growing to fill however

[92:39]

many pieces I actually cram in there. So

[92:42]

now if I click play, the effect is going

[92:44]

to be the same, but it's arguably not

[92:45]

only correct, but also well

[92:49]

designed because now if I want to change

[92:52]

the weight, change it in one place. If I

[92:54]

want to change the total number of

[92:55]

times, change it in one place. So I've

[92:57]

modularized the code and made it better

[93:00]

designed in this case. But now this is

[93:02]

silly because even though I want the cat

[93:06]

to meow, it feels like any program in

[93:08]

which I want this cat to meow, I have to

[93:10]

make these same puzzle pieces and

[93:11]

connect them together. Wouldn't it be

[93:13]

nice to invent the notion of meowing

[93:15]

once and then actually have a puzzle

[93:17]

piece called meow? So when I want the

[93:18]

cat to meow, it will just meow. Well, I

[93:21]

can do that, too. Let me scroll down to

[93:23]

my blocks here in pink. I'm going to

[93:26]

click make a block and I'm going to

[93:27]

literally make a new puzzle piece that

[93:29]

MIT didn't think of called meow. And I'm

[93:32]

going to go ahead and click okay. Now I

[93:34]

have in my code area here a define block

[93:38]

which literally means define meow as

[93:41]

follows. So how am I going to do this?

[93:43]

Well, I'm going to propose that meowing

[93:45]

just means to play the sound meow until

[93:48]

done and then wait 1 second. And notice

[93:51]

now I have nothing inside my actual

[93:53]

program which begins when I click the

[93:55]

green flag. But notice at top left

[93:58]

because I made a block called meow, I

[94:00]

now have access to one that I can drag

[94:02]

and drop. So now I can drag me into this

[94:06]

loop. And per my comment about

[94:09]

abstracting the lower level

[94:11]

implementation details away, I'm going

[94:13]

to sort of unnecessarily dramatically

[94:15]

just move that out of the way. It still

[94:16]

exists. I didn't delete it, but now out

[94:18]

of sight, out of mind. Now, if you agree

[94:21]

with me that meow means for the cat to

[94:23]

make a sound, we've abstracted away what

[94:25]

it means mechanically for the cat to say

[94:28]

that sound. And so, we now have our own

[94:30]

puzzle piece that I can just now use

[94:32]

forever because I invented the meow

[94:33]

block already. Now, I can do one better

[94:36]

than this. It would be nice if I could

[94:37]

just tell the meow block how many times

[94:38]

I want it to meow because then I don't

[94:40]

need to waste time using loops either

[94:42]

myself. So, let me do this. Let me zoom

[94:44]

out and let me go back to my define

[94:48]

block. Let me rightclick or

[94:49]

control-click and just edit it. Or I

[94:51]

could delete it and start over, but I'll

[94:53]

just edit it. And specifically, let me

[94:54]

say, you know what, let's add an input,

[94:57]

otherwise known as an argument, to this

[94:58]

meow block. And we'll call it maybe n

[95:01]

for the number of times I want it to

[95:02]

meow. And just to be super clear, I'm

[95:05]

going to add a label, which has no

[95:06]

functional impact, but it just helps me

[95:07]

remember what this does. So, I'm going

[95:09]

to say meow end time, so that when I see

[95:11]

the puzzle piece, I know what the N

[95:13]

actually represents. If I now click

[95:15]

okay, my puzzle piece looks a little

[95:17]

different at top left. Now it has the

[95:19]

white oval into which I can type or drag

[95:21]

input. Notice down here in the define

[95:24]

block, I now see that same input called

[95:27]

N. So what I can do now is this. Let me

[95:29]

go under control. Glag, drag the repeat

[95:32]

block here. And I have to do a little

[95:34]

switcheroo. Let me disconnect this. Plug

[95:37]

it inside of the repeat block. Reconnect

[95:39]

all of this. And I don't want 10. And

[95:42]

heck, I don't even want three down here

[95:43]

anymore. I can drag this input because

[95:46]

it's the right shape. And now declare

[95:48]

that meowing n times means to repeat the

[95:51]

following n times. Play sound meow until

[95:54]

done. Wait one second and keep doing

[95:56]

that n total times. If I now zoom out

[96:00]

and scroll up, notice that my usage of

[96:03]

this puzzle piece has changed such that

[96:04]

I don't actually need the repeat block

[96:06]

anymore. I can disconnect this. And

[96:09]

heck, I can actually rightclick and uh

[96:11]

control-click and delete it. just use

[96:13]

this under the green flag. Change this

[96:15]

to a three. And now I have the essence

[96:18]

of this meowing program. The

[96:21]

implementation details are out of sight,

[96:22]

out of mind. Once they're correct, I

[96:24]

don't need to worry about them again.

[96:26]

And this is exactly how Scratch itself

[96:28]

works. I have no idea how MIT

[96:30]

implemented the weight block or the

[96:32]

repeat block. Heck, there's a forever

[96:33]

block and there's a few others, but I

[96:35]

don't need to know or care because

[96:37]

they've implemented those building

[96:39]

blocks that I can then implement myself.

[96:41]

I don't necessarily know how to build a

[96:43]

whole chatbot, but on top of OpenAI's

[96:46]

API, this web-based service, I can

[96:48]

implement my own chatbot because they've

[96:50]

done the heavy lift of actually

[96:52]

implementing that for me. Well, let's do

[96:54]

just a few more examples here. Let's

[96:56]

bring the cat all the more to life. Let

[96:57]

me throw away the meowing. Let me open

[96:59]

up under when green flag clicked. How

[97:01]

about that forever block that we just

[97:03]

glimpsed? Let me go ahead and now add to

[97:05]

the mix what we called earlier

[97:07]

conditionals which allow us to ask

[97:10]

questions and decide whether or not we

[97:11]

should do something. So under this, let

[97:13]

me go ahead and under forever say if the

[97:17]

following is true. Well, what boolean

[97:21]

expression do I want to ask? Well, let's

[97:23]

implement how about this program and

[97:24]

we'll figure out if it works. Uh under

[97:26]

sensing, I'm going to grab this uh very

[97:30]

angled puzzle piece called touching

[97:31]

mouse pointer. that is the cursor and

[97:34]

only if that question has a yes answer

[97:37]

do I want to play the sound meow until

[97:40]

done. So let me zoom in here and in

[97:42]

English

[97:44]

what is this going to implement really

[97:48]

just describe what this program does

[97:50]

less arcanely as the code itself.

[97:52]

Yeahouse

[97:57]

>> yeah if you move the mouse over the cat

[97:59]

it will make noise. So, it's kind of

[98:01]

like implementing petting a cat, if you

[98:03]

will. So, let me zoom out, click the

[98:05]

green flag, and notice nothing's

[98:07]

happening yet, but notice my puzzle

[98:09]

pieces are highlighted in yellow because

[98:11]

it is in fact still running because it's

[98:13]

doing something forever. And it's

[98:14]

constantly checking if I'm touching the

[98:17]

mouse pointer. And if so,

[98:20]

it's like I just pet the cat. Now, it

[98:22]

stopped until I move the cursor again.

[98:24]

Now, it stopped. If I leave it there,

[98:27]

it's going to keep meowing because it's

[98:28]

going to be stuck in this loop forever.

[98:30]

But it's correct in so far as I'm

[98:32]

petting the cat. Let me do this though.

[98:34]

Let me make a mistake this time. Let me

[98:37]

forget about the forever and just do

[98:38]

this. And you might think this is

[98:40]

correct. Let me click the green flag

[98:42]

now. Let me pet the cat. And like

[98:45]

nothing's actually working here. Why

[98:47]

though logically?

[98:49]

Yeah.

[98:51]

>> Yeah. The program's so darn fast. It

[98:53]

already ran through the sequence. And at

[98:54]

the moment in time when I clicked the

[98:56]

rear flag, no, I was not touching the

[98:58]

mouse pointer. And so it was too late by

[99:00]

the time I actually moved the cursor

[99:02]

there. But by using the forever block,

[99:04]

which I did correctly the first time,

[99:06]

this ensures that Scratch is constantly

[99:08]

checking the answer to that question. So

[99:10]

if and when I do pet the cat, it will

[99:13]

actually

[99:15]

detect as much. All right, about a few

[99:18]

final examples before you're on your way

[99:20]

building some of your own first programs

[99:22]

with these building blocks. Let me go

[99:23]

ahead and open up a program that I wrote

[99:25]

in advance in fact about 20 years ago

[99:28]

whereby let me pull this up whereby we

[99:32]

have in this example a program I wrote

[99:34]

called Oscar time and this was the

[99:36]

result of our first assignment in this

[99:38]

class whereby when MIT was implementing

[99:41]

Scratch for the very first time we

[99:42]

needed to implement our very own Scratch

[99:44]

program as well. I'm going to go ahead

[99:46]

and full screen it here. The goal is to

[99:48]

drag as much falling trash as you can to

[99:50]

Oscar's trash can before his song ends.

[99:52]

For which one volunteer would be handy

[99:55]

here. Okay. I saw your hand go up

[99:57]

quickly in blue. Yeah. Come on up. All

[99:59]

right. So, you're playing for a stress

[100:01]

ball here if we will. At one at some

[100:03]

point, I'm going to talk over what

[100:04]

you're actually playing just so that we

[100:05]

can point out what it is we're trying to

[100:07]

glean from this program. And I'll

[100:09]

stipulate this probably took me like 8

[100:11]

12 hours. And as you'll soon see, the

[100:12]

song starts to drive you nuts after a

[100:14]

while because I was trying to

[100:15]

synchronize everything in the game to a

[100:17]

childhood song with which you might be

[100:18]

familiar. Let me go ahead and say hello

[100:20]

if you'd like to introduce yourself.

[100:22]

>> Oh, hello. So, I'm Han and uh I'm a

[100:26]

first year student. I'm pretty excited

[100:28]

for this class.

[100:28]

>> All right, welcome. Well, here is Oscar

[100:30]

time. If you want to go ahead and take

[100:32]

control of the keyboard, all you'll need

[100:33]

to do is drag and drop trash that falls

[100:36]

from the sky into the trash can.

[100:49]

Papa

[101:00]

heat.

[101:16]

And it's around this point in the game

[101:17]

where the novelty starts [music] to wear

[101:18]

off because there's like three more

[101:20]

minutes of this game where more and more

[101:21]

stuff starts to fall from the sky. So as

[101:23]

Han, as you continue to play, I'm going

[101:24]

to cut over here. You keep playing.

[101:26]

Let's consider how I implemented this

[101:29]

whereby we'll start at the beginning.

[101:31]

The very first thing I did when

[101:32]

implementing Oscar time honestly was the

[101:34]

easy part. Like I found a lamp post that

[101:36]

looked a little something like this and

[101:38]

I made the so-called costume for the

[101:40]

whole stage. And that was it. The game

[101:42]

didn't do anything. You couldn't play

[101:43]

anything. You put your green flag,

[101:44]

nothing happened. But then I figured out

[101:46]

how to turn the scratch cat, otherwise

[101:49]

known more generally as a sprite, into a

[101:51]

trash can instead. And so the trash can,

[101:55]

meanwhile, is clearly animated because I

[101:57]

realized that, oh, I can give sprites

[101:59]

like the cat different costumes. So, I

[102:01]

can make the cat not only look like a

[102:03]

trash can, but if I want its lid to go

[102:05]

up, well, that's just another costume.

[102:07]

And if I want to see Oscar popping out,

[102:09]

that's just a third costume. And so, I

[102:11]

made my own simplistic animation. And

[102:14]

you can kind of see it. It's very

[102:15]

jittery step by step by step by creating

[102:19]

the illusion of animation by really just

[102:20]

having a few different images or

[102:22]

costumes on Oscar. Now, I hope you

[102:23]

appreciate how much effort went involved

[102:25]

into timing each of these pieces of

[102:26]

trash with the specific mention of that

[102:28]

type of piece of trash in the music.

[102:30]

Okay. 20 years later, still clinging.

[102:32]

So, you're doing amazing, by the way.

[102:34]

How do we get the trash to fall in the

[102:36]

first place? Well, at the very beginning

[102:37]

of the game, the trash just started

[102:39]

falling from some random location. What

[102:41]

does it mean for trash to fall from the

[102:42]

sky?

[102:44]

Oh, big climax here.

[102:48]

You got a lot of trash on the ground to

[102:49]

pick up.

[102:54]

There we go. And your final score is

[102:59]

a big round of applause if we could for

[103:00]

Han. [applause and cheering] Thank you.

[103:04]

Thank you. So just to be clear now,

[103:07]

let's decompose this fairly involved

[103:09]

program that took me a lot of hours to

[103:11]

make into its component parts. So this

[103:14]

is just a sprite. And I figured out

[103:15]

eventually how to change its costume,

[103:17]

change its costume, change its costume

[103:18]

to simulate some kind of animation. And

[103:20]

I also realized that oh, I don't need to

[103:22]

just have one sprite or one cat or trash

[103:24]

can. You can create a second sprite, a

[103:26]

third sprite, and many more. So I just

[103:28]

told the sprite to go to a random

[103:30]

location at Y equals 180 and X equals

[103:34]

something. I think I restricted X to be

[103:35]

in this region, which is why the trash

[103:37]

never falls from over here. I just did a

[103:39]

little bit of math based on that

[103:40]

cartisian plane that we saw a slide of

[103:41]

earlier. And then I probably had a loop

[103:44]

that told the trash to move a pixel,

[103:46]

move a pixel, move a pixel down, down,

[103:48]

down, down until it eventually hits the

[103:50]

bottom and therefore just stops. So we

[103:52]

can actually see this step by step. And

[103:54]

this is representative of how even for

[103:56]

something like your first problem said

[103:57]

in CS50 and with Scratch specifically,

[104:00]

you might build some of the same. So,

[104:02]

I'm going to go back into uh CS50 Studio

[104:05]

for today, which is linked on the

[104:06]

courses website, which has a few

[104:08]

different versions of this and other

[104:10]

programs called Oscar 0ero through Oscar

[104:13]

4, where zero is the simplest. And

[104:15]

truly, I meant it when I look inside

[104:17]

this program to see my code. Like, this

[104:19]

was it. There was no code because all I

[104:21]

did was put the sprite on the screen and

[104:23]

change it from a cat to a trash can. And

[104:25]

I added a costume uh a costume for the

[104:27]

stage, so to speak, so that the lamp

[104:29]

post would be fixated there. If I then

[104:31]

go to the next version of code, version

[104:34]

one, so to speak, then I had code that

[104:36]

did this. Now, notice there's a few

[104:38]

things going on here. At bottom left,

[104:40]

you'll see of course the trash can and

[104:42]

then at top right the trash. Here are

[104:44]

the corresponding sprites down here. So,

[104:46]

when Oscar is clicked on here, the trash

[104:48]

can, you see the code I wrote, the

[104:50]

puzzle pieces I dragged for Oscar. And

[104:52]

in a moment, when we click on trash,

[104:54]

you'll see the code I wrote or the

[104:56]

puzzle pieces I wrote dragged and

[104:58]

dropped for the trash piece

[104:59]

specifically. So what does Oscar do?

[105:02]

Well, I first switch his costume to

[105:04]

Oscar 1, which I assume is this the

[105:07]

closed trash can. Then forever Oscar

[105:10]

does the following. If Oscar's touching

[105:12]

the mouse pointer, then change the

[105:14]

costume to Oscar 2. Otherwise, that is

[105:17]

if not touching the mouse pointer,

[105:19]

change the costume to Oscar 1. Well,

[105:21]

what's the implication? Anytime I move

[105:23]

the cursor over the trash can, the lid

[105:25]

just pops up, which was exactly the

[105:27]

animation I wanted to achieve.

[105:29]

Meanwhile, if we do this and click the

[105:31]

green flag, you can see that in action,

[105:33]

even for this simple version. If I move

[105:36]

the cursor over Oscar, we have the

[105:38]

beginnings of a game, even though

[105:39]

there's no score, there's no music or

[105:41]

anything else, but I've solved one of my

[105:43]

problems. Meanwhile, if I click on the

[105:45]

trash piece here, and then you'll see no

[105:48]

code has been written for it yet. So, we

[105:50]

move on to Oscar version two and see

[105:52]

inside it. In Oscar version two, when I

[105:54]

click on trash, ah, now there's some

[105:57]

juicy stuff happening here. And in fact,

[105:59]

this trash sprite has two programs or

[106:02]

scripts associated with it. And that's

[106:04]

fine. Each of them starts with when

[106:05]

green flag clicked, which means the

[106:07]

piece of trash will do two things at

[106:09]

once essentially in parallel. The first

[106:11]

thing it will do is we'll set drag mode

[106:13]

to dragable. And that's just a scratch

[106:14]

thing that lets you actually move the

[106:15]

sprites by clicking on them, making them

[106:17]

dragable. Then it goes to a random X

[106:19]

location between 0 and 240. So yeah,

[106:22]

that must be what I did from the middle

[106:24]

all the way to the right. And I set y

[106:26]

always to 180, which is why the trash

[106:28]

always comes from the sky from the very

[106:30]

top. Then I said forever change your y

[106:33]

by negative one. And here's where it's

[106:35]

useful to know what 180 is, 240 is, and

[106:37]

so forth. Because if I want the trash to

[106:39]

go down, so to speak, that's changing

[106:41]

its Y by a pixel by a pixel by a pixel.

[106:44]

And thankfully MIT implemented it such

[106:46]

that if the trash tries to go off the

[106:48]

screen, it will just stop automatically,

[106:51]

even if it's inside of a forever block,

[106:52]

lest you lose control over the sprites

[106:54]

altogether. But in parallel, what's

[106:56]

happening is this. Also, when the green

[106:58]

flag is clicked, uh the trash piece is

[107:01]

doing this too forever. If touching

[107:03]

Oscar, what's it doing in blue here?

[107:08]

Sort of teleporting away. Now, to your

[107:11]

eye, hopefully it looks like it's going

[107:13]

into the trash can. But what does that

[107:15]

mean to go into the trash can? Well, I

[107:17]

just put it back into the sky as though

[107:18]

a new piece of trash is falling. So even

[107:21]

though you saw one piece of trash, two,

[107:23]

three, four, and so forth, it's the same

[107:25]

sprite just acting that out again and

[107:27]

again. So here, if I click play on this

[107:31]

program, you'll see that it starts

[107:32]

falling one pixel at a time. Because

[107:36]

it's draggable, I can sort of pull it

[107:37]

away and move it over to the trash can

[107:40]

like that. And as soon as I do, it seems

[107:42]

to go in, but really it just teleported

[107:44]

to a different X location. Still at Y=

[107:47]

180. Again, it's not much of a game yet.

[107:49]

There's no score. There's no music or

[107:50]

anything, but let's go to Oscar 3 now.

[107:52]

And in Oscar 3, if we scroll over to the

[107:56]

trash, even more is happening here. In

[107:58]

so far as I realized, you know what?

[108:00]

There was kind of a inefficiency before.

[108:03]

Previously, I had these two programs or

[108:05]

scripts synonym whereby they both went

[108:07]

to the top by going to 0 to 240 for X

[108:12]

and then 180 for Y. And if you noticed,

[108:14]

I used that here and I used that down

[108:17]

here in both programs. Now that too is

[108:18]

kind of stupid because I literally

[108:19]

copied and pasted the same code. So if I

[108:21]

ever want to change that design, I have

[108:23]

to change it in two places and I already

[108:25]

proposed that we frown upon that. So

[108:26]

what did I do in this version? I just

[108:28]

created my own block and I decided to

[108:30]

call my own function go to top. What

[108:33]

does it mean to go to the top? Pick a

[108:34]

random x between those values and fixate

[108:37]

on y= 180 initially. Now in both of

[108:41]

those programs which are otherwise

[108:42]

identical, I just say what I mean. Go to

[108:45]

top. Go to top. And if I really wanted

[108:46]

to, I could drag this out of the way and

[108:48]

never think about it again because now

[108:50]

that functionality exists. So correct,

[108:53]

but arguably better designed. I've now

[108:55]

factored out commonality so as to use

[108:58]

and reuse my code as well. So let's go

[109:01]

up to Oscar version 4 now. And in Oscar

[109:03]

time version 4, the trash can does a

[109:07]

little something more whereby what have

[109:10]

I added to this mix even though we

[109:11]

haven't dragged this puzzle piece

[109:13]

together before?

[109:15]

Yeah. What's new?

[109:16]

>> Score.

[109:17]

>> Yeah. So, it turns out on the left here,

[109:19]

there's a variables category, which is

[109:21]

goes beyond the answer variable that we

[109:24]

just automatically get from the ask

[109:26]

block. You can create your own variables

[109:28]

X, Y, Z. But in computer and

[109:30]

programming, it's best to name things,

[109:31]

not silly simple words like X, Y, and Z,

[109:34]

but full-fledged words that say what

[109:35]

they are, like score. So, I'm setting a

[109:37]

score variable to zero. And then any

[109:40]

time the trash is touching Oscar before

[109:43]

it teleports away to the top, I change

[109:45]

the score by one. That is increment the

[109:47]

score by one. And what Scratch does

[109:49]

automatically for me is it puts a little

[109:51]

billboard up here showing me the current

[109:53]

score. So if I now play this game once

[109:57]

more, the score is going to start at

[109:59]

zero. But if I drag this trash over here

[110:01]

and even let it fall in, as soon as it

[110:03]

touches, the score goes to one. And now

[110:05]

if I click and drag again, the score is

[110:07]

going to as soon as it touches Oscar

[110:09]

going to go to two and so forth. And you

[110:12]

saw in the final flourish with Han

[110:13]

playing that once you had the sound and

[110:15]

other pieces of trash, which are just

[110:17]

really other sprites and I just had wait

[110:19]

like a minute, wait two minutes so that

[110:21]

the trumpet would fall at the right

[110:22]

time. I've broken down a fairly involved

[110:25]

program into these basic building

[110:27]

blocks. And when you too write your own

[110:29]

program, that's exactly how you should

[110:31]

approach it. Even if you have these

[110:32]

grand aspirations to do this or that,

[110:35]

start by the simple problems and figure

[110:37]

out what bites can I uh bite off in

[110:39]

order to make progress. Baby steps if

[110:42]

you will to the final solution. Well,

[110:44]

let's look at one other set of examples

[110:46]

before we have one final volunteer to

[110:48]

come up. And as you'll soon see, it's

[110:49]

tradition in CS50 to end the first class

[110:51]

with cake. So, in a moment, cake will be

[110:54]

served out in the transcept. And please

[110:55]

feel free to come up and say hi and ask

[110:56]

questions if you'd like to. Let me go

[110:58]

ahead and open up though a series of

[111:00]

building blocks here via which we can

[111:02]

make so-called Ivy's hardest game which

[111:05]

is one implemented by one of your

[111:07]

predecessors, a former classmate from

[111:09]

CS50. So here we have a whole bunch of

[111:11]

puzzle pieces written by your classmates

[111:13]

but let me go ahead and zoom in on this

[111:15]

screen. You'll see that this harbored

[111:17]

crest is my sprite. So it's not a cat,

[111:18]

it's not a trash can, it's a harbored

[111:20]

crest and it exists in a very simple

[111:22]

two-dimensional world with two walls

[111:24]

next to it. If I click on the green

[111:26]

flag, notice that with my hands here, I

[111:29]

can go up, I can go down, I can go left,

[111:32]

and I can go right. But if I try going

[111:34]

too far right, I get stuck on the wall.

[111:36]

If I go too far left, I get stuck on the

[111:39]

wall. Well, it's the sort of the

[111:41]

beginning of any animation or game. But

[111:42]

how do I do this? Well, let me go up

[111:44]

here and propose that the first thing

[111:46]

the Harvard sprite is doing is it's

[111:48]

going to the middle 0 comma 0. And it's

[111:51]

then forever listening for the keyboard

[111:54]

and feeling for walls. Now those are

[111:56]

functions I implemented myself to kind

[111:59]

of describe what I wanted the program to

[112:01]

do. And let's do the shorter one first.

[112:03]

What does it mean to feel for the walls?

[112:05]

Just to ask the question, if you're

[112:06]

touching the left wall, change your x by

[112:09]

one. If you're touching the right wall,

[112:11]

change your x by negative one.

[112:15]

Why have I defined touching walls in

[112:18]

this weirdly mathematical way? Yeah.

[112:22]

>> Sure. Yeah.

[112:24]

>> Like counteracts the movement.

[112:26]

Otherwise, you're like not moving.

[112:29]

>> Exactly. Because if I've gone so far

[112:31]

right that I'm touching the right wall,

[112:33]

well, I'm already kind of on top of the

[112:35]

wall a little bit. So, I effectively

[112:37]

want the sprite to bounce off of it. And

[112:39]

the easiest way to do that is just to

[112:40]

say back up one pixel as though you

[112:42]

can't go any further. And same for the

[112:43]

left wall. Meanwhile, let me scroll over

[112:45]

to the second script or program that's

[112:48]

running in parallel. It's a little

[112:49]

longer, but it's not more complicated.

[112:51]

What does it mean to listen for

[112:52]

keyboard? Well, just check. If the key

[112:56]

up arrow is pressed, change Y by one.

[112:59]

Arrow go up. Else if the key down arrow

[113:01]

is pressed, then change Y by negative 1.

[113:03]

Key right arrow is pressed, change X by

[113:05]

one, and so forth. So again, this is

[113:07]

where the math and the numbers are

[113:08]

useful because it gives you a world in

[113:10]

which to live. Up, down, left, right.

[113:12]

deconstructed into some simple

[113:14]

arithmetic values. All right, so the net

[113:17]

result is that we have a crest living in

[113:19]

this world. Well, let's add a bit of

[113:21]

competition here. And in the second

[113:22]

version of this game, let me go ahead

[113:24]

and full screen it again. Click play.

[113:26]

And now we'll see sort of an enemy

[113:28]

bouncing back and forth autonomously. So

[113:31]

there's no one playing except me. I'm

[113:33]

controlling Harvard. Yale is bouncing on

[113:34]

its own. And nothing bad's going to

[113:36]

happen if it hits me. But it does seem

[113:38]

to be autonomous. So how is this

[113:40]

working? Well, if it's doing this

[113:42]

forever, there's probably a forever loop

[113:43]

involved. So, let's see inside here.

[113:45]

Let's click not on Harvard, but on the

[113:48]

Yale sprite. And sure enough, if we

[113:50]

focus on this for a moment, we'll see

[113:52]

that the first thing Yale does is go to

[113:54]

0 comma 0. It points in direction 90°,

[113:56]

which just gives you a sense of whether

[113:58]

you're facing left or right or wherever.

[113:59]

And then it forever does the following.

[114:01]

If it's touching the left wall or

[114:03]

touching the right wall, I was a little

[114:05]

clever this time, if I may. I just kind

[114:07]

of turn around 180 degrees, which

[114:09]

effectively bounces me back in the

[114:11]

opposite direction. Otherwise, I go

[114:13]

ahead and no matter what just move one

[114:15]

step. And this is why Yale is always

[114:17]

moving back and forth. So, a quick

[114:18]

question. If I wanted to speed up Yale

[114:20]

and make this beginning of a game

[114:22]

harder, what would I do?

[114:25]

Yeah.

[114:28]

>> Yeah. So, let's have it move like 10

[114:29]

steps at a time, right? This looks like

[114:30]

a much harder game, if you will, like

[114:32]

level 10 now, because it's just moving

[114:34]

so much faster. All right. Well, let's

[114:36]

try a third version of this that adds

[114:37]

another ingredient. Let me full screen

[114:39]

this and click play. And now you'll see

[114:41]

the even smarter MIT homing in on me by

[114:46]

following my actual movements. So, this

[114:48]

is sort of like boss level material now.

[114:51]

And it's just going to follow me. So,

[114:53]

how is this working? Well, it's kind of

[114:55]

a common game paradigm, but what does

[114:57]

this mean? Well, let's see inside here.

[114:59]

Let's click on MIT sprite. It's pretty

[115:01]

darn easy.

[115:03]

go to some random position just to make

[115:05]

it a little interesting lest MIT always

[115:07]

start in the center and then forever

[115:08]

point towards the Harvard logo outline

[115:11]

which is the name the former student

[115:12]

gave to the costume that the sprite is

[115:14]

wearing that looks like a Harvard crest

[115:16]

and then move one step. So coral layer

[115:18]

of the previous question, how do we make

[115:19]

the game harder and MIT even faster?

[115:22]

Well, we can change this to be like 10

[115:24]

steps and now you'll see MIT is a little

[115:27]

twitchy because

[115:30]

this is kind of a visual bug. Let me

[115:31]

make it full screen.

[115:34]

Why is this visual glitch happening?

[115:37]

It's literally doing what I told it to

[115:39]

do. It just looks stupid. Yeah.

[115:45]

Say again.

[115:48]

>> Yeah. It's moving so fast that it's sort

[115:49]

of going 10 pixels this way, but then I

[115:51]

kind of it kind of overshot me. So then

[115:53]

it's doubling back to follow me again,

[115:54]

and it's doubling back this way. And

[115:56]

because these are such big footsteps, if

[115:58]

you will, it just has this visual effect

[116:00]

of twitching back and forth. So, we

[116:01]

might have to throttle that back a bit

[116:02]

and make it five or two or three instead

[116:05]

of 10 because that's clearly not

[116:06]

desirable gaming behavior here. All

[116:09]

right. Well, let's go ahead and do this.

[116:10]

Let's put them all together just as your

[116:12]

former classmate did when submitting

[116:13]

this actual homework. Uh, the game will

[116:16]

conclude hopefully in an amazing climax

[116:17]

where you've won the game. So, we need

[116:19]

someone ideally with really good hand

[116:20]

eye coordination to play this final game

[116:23]

here. Yeah, your hand went up first, I

[116:25]

think. Okay, come on up. Big round of

[116:27]

applause because this is a lot of

[116:28]

pressure to [applause] end.

[116:34]

All right. So, if you win the game, cake

[116:37]

will be served. If you don't win the

[116:39]

game, there will be no cake.

[116:41]

>> Okay. But introduce yourself in the

[116:43]

meantime.

[116:43]

>> Hi, I'm Jenny Pan, freshman at Hollis

[116:47]

and I'm actually a CS major or

[116:49]

concentration.

[116:50]

>> Nice to meet you. Head to the keyboard

[116:51]

here. This now is the combination of all

[116:54]

of those building blocks and even more

[116:56]

aka Ivy's hardest game. You will be in

[116:58]

control just as I would of the harbored

[117:00]

crest. And the goal is to make it to the

[117:02]

exit, which is this gentleman on the

[117:03]

right here. And you'll see there's

[117:04]

multiple levels where it's each level

[117:06]

gets a little harder. All right, here we

[117:08]

go.

[117:36]

>> [music]

[117:50]

>> Heat.

[118:03]

[music]

[118:06]

Heat.

[118:09]

[music]

[118:20]

>> [music]

[118:25]

>> All right, this is CS50 and this is week

[118:28]

one, our second week together. And

[118:30]

you'll recall that last week, week zero,

[118:32]

we focused on Scratch. Ultimately, this

[118:34]

graphical programming language by which

[118:36]

you can drag and drop puzzle pieces that

[118:37]

interlock together only if it makes

[118:39]

logical sense to do so. And many of you

[118:41]

had actually probably played with that

[118:43]

in like middle school or even prior at

[118:44]

some point. But for our purposes, the

[118:46]

goals of Scratch were to give us sort of

[118:48]

a mental model for some fundamental

[118:50]

constructs that we're going to see again

[118:51]

and again today in C in a few weeks in

[118:54]

Python and even thereafter. And those

[118:55]

include things like functions and return

[118:58]

variables and arguments and variables

[119:01]

and loops and conditionals and more. And

[119:03]

so even if today feels like a bit of a

[119:05]

fire hose, such as that picture here,

[119:08]

appreciate that a lot of today's ideas

[119:10]

are exactly the same as last week's

[119:12]

ideas, it's just that the syntax is

[119:14]

going to change. It's going to look a

[119:15]

little different. It's going to look a

[119:16]

little scarier. It's going to be harder

[119:17]

to sort of memorize, except with

[119:19]

practice will come that muscle memory,

[119:21]

but the ideas ultimately are going to be

[119:23]

the same. And indeed, this is, if

[119:24]

unfamiliar, uh MIT down the road has a

[119:27]

tradition of hacks whereby students once

[119:28]

a year do something fairly crazy. And at

[119:30]

this point, they happen to connect an

[119:32]

actual working uh drinking fountain to

[119:35]

an actual fire hydrant. And the sign

[119:37]

there, very pixelated, says, "Getting an

[119:39]

education from MIT is like trying to

[119:41]

drink from a fire hose." And that's

[119:42]

indeed how computer science, how

[119:45]

programming, how CS50 will sometimes

[119:47]

feel, but realize that what's going to

[119:49]

be ultimately most important is not

[119:52]

where you uh feel you are day after day,

[119:54]

but where 3 months from now you feel

[119:57]

that you are relative to last week

[119:59]

alone. so-called week zero. So, let's

[120:01]

look back at what week zero looked like.

[120:03]

It looked a little something like this.

[120:04]

The simplest of programs by which we get

[120:06]

get that cat to say hello world. Today,

[120:09]

that same code is going to start to look

[120:11]

a little like this, which was a glimpse

[120:13]

we gave you last week. But this time,

[120:14]

I've deliberately colorcoded it to try

[120:16]

to send the message that whereas in

[120:17]

Scratch, we had this yellowish puzzle

[120:20]

piece that sort of kicked things off

[120:22]

that didn't really do anything itself,

[120:23]

but it got the program started, whereas

[120:25]

the real work was done in purple here.

[120:27]

Same is going to be true today whereby

[120:29]

I'm going to wave my hands for a little

[120:31]

bit of time at this yellowish code on

[120:33]

the screen. But what's really going to

[120:35]

have the most effect is this same purple

[120:37]

line here and the white text within. And

[120:39]

we'll break down what all of these lines

[120:40]

mean over the next couple of weeks. But

[120:42]

sometimes we'll wave our hand at details

[120:44]

if we feel it's a little unnecessary at

[120:46]

this point in the story. And in fact,

[120:48]

let me get rid of the color coding for

[120:49]

now. And we'll see that this is the kind

[120:51]

of code in a language called C we're

[120:53]

going to start playing with and using

[120:55]

today and for the next several weeks.

[120:57]

And indeed, it's representative of what

[120:59]

we're going to generally call source

[121:00]

code. So source code is what programmers

[121:02]

write. It's what you write. It's what

[121:04]

you wrote, albeit by dragging and

[121:05]

dropping puzzle pieces. This week

[121:07]

onward, you're going to start using your

[121:08]

keyboard all the more. And you're going

[121:09]

to write source code. So this is code

[121:11]

that we humans can understand with some

[121:13]

training and with some practice. But of

[121:16]

course per last week, what language do

[121:18]

computers ultimately understand? Only

[121:22]

>> so binary zeros and ones. And so you and

[121:25]

I, yes, can write code starting today in

[121:27]

a form that looks a little something

[121:29]

like this, which admittedly might look a

[121:30]

little arcane and cryptic, but it's

[121:33]

certainly better than a whole bunch of

[121:34]

zeros and ones. But we're going to write

[121:36]

in source code. But the machines that we

[121:38]

write code for ultimately only

[121:40]

understand these here, zeros and ones,

[121:42]

which may very well say hello world, but

[121:44]

we're going to call this moving forward

[121:46]

machine code. So machine code is what

[121:48]

the the computers understand. Only the

[121:50]

zeros and ones. Source code is what you

[121:52]

and I understand and actually write. So

[121:54]

it stands to reason that we're going to

[121:55]

have to somehow translate one to the

[121:57]

other from source code to machine code.

[122:00]

And I alluded to this ever so briefly

[122:02]

last week, but we're going to use this

[122:04]

same mental model whereby the source

[122:06]

code we write might be the input to some

[122:08]

problem. The output we want there from

[122:09]

is going to be the machine code. So what

[122:12]

we're going to equip you with today

[122:13]

inside of this proverbial black box is a

[122:16]

special piece of software that takes

[122:17]

source code as input, produces machine

[122:20]

code as output, and that type of program

[122:22]

is called a compiler. And there's

[122:24]

bunches of difference of compilers in

[122:25]

the world. We're going to have you use

[122:27]

one of the most popular ones, but it's

[122:28]

simply a piece of software that someone

[122:30]

else wrote that converts one language to

[122:32]

another. Source code, for instance, in a

[122:34]

language called C to machine code, the

[122:37]

zeros and ones that our Macs, PCs,

[122:38]

phones, and other devices actually

[122:40]

understand. So, where are we going to do

[122:42]

this and how are we going to do this?

[122:43]

So, I promised last week that we'd

[122:45]

introduce you to this year tool, which I

[122:47]

used briefly at the very start of class

[122:48]

to whip up that chatbot. We're going to

[122:50]

use it though not for Python this week,

[122:52]

but indeed for a different language, C.

[122:54]

And indeed, this tool, Visual Studio

[122:56]

Code, or VS Code for short, is super

[122:58]

popular in industry. This is what real

[123:00]

programmers, so to speak, are using all

[123:02]

of the time nowadays. There's absolutely

[123:04]

alternatives. If some of you have

[123:05]

programmed before, you might have used

[123:07]

or experienced different tools, but this

[123:09]

is a very common tool that you'll see

[123:10]

even after CS50. And in fact, it's

[123:13]

something that ultimately you can

[123:14]

install for free on your own Macs and

[123:16]

PCs so that by the end of the course,

[123:18]

you're completely independent of CS50

[123:21]

and any CS50 related tools. But what we

[123:24]

have done for the very start of the

[123:25]

class is essentially provided you with a

[123:27]

cloud-based version of this tool. So all

[123:30]

you need is a web browser on any Mac or

[123:32]

PC or the like so that everything's

[123:34]

pre-installed for you, preconfigured for

[123:36]

you, and you don't have to deal with the

[123:37]

stupid technical support headaches at

[123:38]

the start of the term because it should

[123:40]

just work. But by the end of the term,

[123:42]

once you're a little more comfortable

[123:43]

with technology and with code in

[123:45]

particular, you can absolutely offboard

[123:47]

yourself from this tool. Install it,

[123:49]

download it on your own Mac and PC and

[123:51]

have pretty much the exact same

[123:52]

environment completely under your

[123:55]

control. So, starting today, you're

[123:56]

going to see an interface that looks

[123:58]

quite like this quite often. And we used

[124:00]

this same interface last week ever so

[124:01]

briefly. Moving forward, here's where

[124:03]

we're going to write code. At top right

[124:04]

is where one or more code tabs are going

[124:07]

to appear, similar to any tabbed uh

[124:09]

environment that you might use. Here,

[124:11]

for instance, is just a screenshot of

[124:12]

the first file we'll create today called

[124:14]

hello.c. The reason it's called hello.c

[124:17]

is because it's in a language called C,

[124:19]

as we soon shall see. No pun intended.

[124:22]

Meanwhile, the code here happens to be

[124:24]

colorcoded, not quite in the same way as

[124:27]

you saw before cuz I manually made it

[124:28]

look more like scratch blocks. But among

[124:30]

the features that VS Code and other

[124:33]

programming environments provide is

[124:34]

something called syntax highlighting

[124:36]

whereby you don't worry about or even

[124:38]

think about these colors. But as you

[124:40]

write out code in a recognized language,

[124:43]

tools like VS Code will just color code

[124:45]

different parts of your code for you

[124:47]

just to make different features jump

[124:49]

out. And we'll see what those features

[124:50]

are over the course of today. But you'll

[124:52]

also spend a good amount of time, as I

[124:54]

briefly did last week, down here in the

[124:56]

bottom right of your screen, the

[124:57]

so-called terminal window, which is

[124:59]

going to be where you run commands for

[125:01]

compiling code and writing code. And in

[125:03]

fact, as we'll see today, you're going

[125:05]

to start using your mouse and clicking a

[125:07]

little bit less. You're going to start

[125:08]

using your keyboard and typing a bit

[125:10]

more. And ultimately, even though if at

[125:12]

first that might feel like a step

[125:13]

backwards to sort of not use something

[125:15]

that's so user friendly, the reality is

[125:17]

most every programmer tends to find

[125:19]

themselves ultimately much more

[125:21]

productive, much more powerful using the

[125:23]

keyboard more often, more quickly than

[125:26]

say a traditional mouse or trackpad

[125:28]

would allow. Meanwhile, we'll see some

[125:29]

somewhat familiar features here at left,

[125:31]

like this is where you'll see the files

[125:32]

and folders that will create over time.

[125:34]

At far left here is going to be an

[125:36]

activity bar, which is essentially a

[125:37]

modern form of a menu via which you can

[125:39]

open and close things and access other

[125:41]

features. For my purposes, I'll

[125:42]

generally hide this part here. I'll

[125:44]

generally hide this part here so that

[125:46]

when we're together, we're focusing

[125:47]

almost entirely on code and commands,

[125:49]

but I'm just typing some quick keyboard

[125:51]

shortcuts to simplify my own user

[125:54]

interface in that way. So, with all that

[125:57]

said, just some terminology. So this

[125:59]

whole collective environment that I'm

[126:01]

describing here is generally what's

[126:02]

known as a graphical user interface.

[126:04]

Why? Well, it's an interface for users

[126:06]

that's graphical in nature with icons

[126:08]

and buttons and the like. Shorthand

[126:10]

notation for this is guey, GUI for

[126:12]

short. But within this graphical user

[126:14]

interface, as promised, is going to be

[126:16]

that terminal window at bottom right

[126:18]

where I promised we would be typing most

[126:20]

of our commands. And just to give you a

[126:21]

bit more jargon in computing, that's

[126:23]

generally known as a command line

[126:25]

interface or CLI for short, whereby

[126:28]

you're typing commands into that

[126:30]

interface instead. And the world of

[126:32]

computing software is essentially

[126:34]

divided into gueies and CLIs and

[126:36]

sometimes a piece of software might have

[126:38]

one of each as well. But without further

[126:41]

ado, why don't we go ahead and focus

[126:43]

entirely first on this here program,

[126:45]

which I dare say is the simplest program

[126:47]

you can write in a language like C and

[126:50]

see how we can actually compile and run

[126:52]

it together. So, I'm going to go over to

[126:54]

VS Code here where I've hidden my file

[126:56]

explorer with all the icons and I've

[126:58]

hidden my activity bar so that only do I

[127:01]

have room for tabs of code and the

[127:03]

command prompt at the bottom. I'm

[127:05]

calling this a command prompt because

[127:06]

it's at this dollar sign where I'm going

[127:08]

to run some of my commands. And it's a

[127:10]

dollar sign by convention. It has

[127:12]

nothing to do with currency. It's just a

[127:13]

computing convention. Some systems will

[127:15]

use a carrot symbol. Some systems will

[127:17]

use a greater than symbol rather or

[127:20]

something else. But it just means type

[127:21]

your commands here. The first such

[127:23]

command I'm going to type is this code

[127:26]

hello. C with a single space in between.

[127:28]

I've not used any spaces in the name of

[127:31]

the file. I've not capitalized any

[127:32]

aspect of the file just because this is

[127:34]

convention. Unlike your Mac or PC where

[127:36]

you might be in the habit of naming

[127:38]

files with spaces and capitalization,

[127:40]

generally you'll make your life simpler

[127:42]

by just using lowercase and no spaces at

[127:44]

all. As soon as I hit enter, what you'll

[127:46]

see is that a brand new tab appears

[127:48]

called hello C with a cursor blinking on

[127:51]

line one. And this is essentially VS

[127:53]

code waiting for me now to type the

[127:55]

first line of my code. Notice though

[127:57]

that the command is complete there by

[127:59]

whereby I am have another cursor here

[128:01]

which I've give if I give click in the

[128:02]

terminal window and give foreground to

[128:04]

it my cursor might blink there instead

[128:06]

that just means I can type another

[128:07]

command when I am ready. So let's go

[128:09]

ahead and whip up this code and I've

[128:11]

done this many times so I can type it

[128:12]

fairly quickly but in this tab I'm going

[128:14]

to do include standard io.h h so to

[128:17]

speak int main void then inside of

[128:20]

so-called curly braces indenting therein

[128:23]

by four spaces I'm going to say print f

[128:26]

quote unquote hello world back slashn

[128:29]

close quote semicolon and voila I've

[128:32]

written my first program in C in a class

[128:34]

like this no need to write down each and

[128:36]

every line of code that I write in fact

[128:37]

on the course's website will be copies

[128:39]

of everything that we've done as well as

[128:40]

excerpts there from in the courses notes

[128:43]

but you're welcome but not expected to

[128:45]

follow along in real time with what I am

[128:47]

typing here. So that's it. Like I've

[128:50]

written my very first program in C. If I

[128:52]

had done this on an actual Mac or PC

[128:54]

without a command line interface, I

[128:57]

might have a new icon on my desktop, so

[128:58]

to speak, called hello. And ideally, I

[129:01]

could double click on that or tap on it

[129:02]

and run the program. But because I'm in

[129:04]

this specific programming environment

[129:06]

that has a mix of a guey and a CLI, I

[129:09]

actually need to click down in my

[129:10]

terminal window. And I need to now

[129:12]

compile this program first because at

[129:14]

this point in time, it exists only as

[129:16]

source code. So to do this, I'm going to

[129:19]

compile my code by very aptly saying

[129:21]

make space hello. And I'm pronouncing

[129:24]

the space, but literally I hit the space

[129:26]

bar. Make space hello as it sort of

[129:28]

implies semantically will make a program

[129:30]

called hello. Notice I have not said

[129:32]

hello.c C again because the compiler,

[129:35]

let's call it make for now, even though

[129:37]

that's a bit of a white lie, is going to

[129:38]

infer that if I want to make a program

[129:40]

called hello, it's going to

[129:42]

automatically look for a file called

[129:44]

hello. C in this case. So, a bit of

[129:47]

magic. Enter. And remarkably, anytime

[129:50]

you don't see any output at a command

[129:52]

like this, that's probably a good thing.

[129:54]

Generally speaking, when you see output

[129:56]

when compiling your code, you have done

[129:58]

something wrong. Or in this case, I

[129:59]

might have done something wrong. But no

[130:01]

output is good because what I can now do

[130:03]

and this is a bit cryptic. I can run

[130:05]

this program not by double clicking or

[130:07]

tapping anywhere but by doing dot slashh

[130:10]

hello with no spaces. And this is a bit

[130:13]

weird but what the dot slash means is

[130:15]

that a having just made a program called

[130:18]

hello that program is going to end up in

[130:20]

my current folder. It's somewhere in the

[130:22]

cloud. Yes, more on that in a bit. But

[130:23]

the program called hello is just

[130:25]

somewhere in my current folder. When I

[130:27]

say dot slash, that's like saying go

[130:29]

into the current folder and run the

[130:32]

program therein called hello

[130:34]

specifically. Now, as I often do, I'll

[130:36]

cross my fingers, hope that I didn't

[130:38]

mess this up in any way, and I should

[130:40]

see in a second hello world indeed

[130:43]

printed onto the screen. And so, just to

[130:46]

recap those then commands. One, I ran

[130:48]

code hello.c, which is a VS code

[130:50]

specific thing. Code short for VS Code

[130:52]

just creates a new file called hello.c.

[130:54]

And then I'm on my way with my own

[130:56]

keyboard. Make hello compiles that

[130:59]

source code into machine code thereby

[131:02]

creating a new file called hello. And to

[131:04]

run that program hello, I type this

[131:07]

strange command dot /hello. But this is

[131:10]

a paradigm. No matter what you call your

[131:12]

programs, we're going to see again and

[131:13]

again and again. So even if you've not

[131:15]

done something quite like this, it will

[131:17]

very quickly get familiar.

[131:20]

Yes. Questions.

[131:23]

How when you say make hello, how like

[131:25]

how does how do you how does the

[131:27]

computer know like what part of the code

[131:29]

to what part of the code is ascribed to

[131:32]

hello?

[131:32]

>> Good question. When I say make hello,

[131:34]

how does the computer know what part of

[131:36]

the code is ascribed to this program

[131:37]

hello? It literally is going to take the

[131:40]

entire contents of hello.c and turn them

[131:43]

somehow into a program.

[131:44]

>> And does it have to be like named hello?

[131:47]

>> Does it have to be named hello? No. I

[131:48]

could have called it goodbye or anything

[131:50]

more my first program C. anything at all

[131:52]

so long as I change these words here

[131:56]

accordingly.

[131:56]

>> But it has to like it needs to be like

[131:59]

from the same thing like it needs to

[132:01]

>> Yes.

[132:02]

>> have like green C and make green or

[132:04]

whatever.

[132:04]

>> Exactly. If you change the name there

[132:06]

you need to change your commands

[132:07]

accordingly. Other questions on these

[132:09]

here steps?

[132:14]

No. All right. So let's tease apart what

[132:16]

it is we just did and like why this code

[132:18]

works in the way that it does. Well, to

[132:20]

recap, in Scratch, we had a program like

[132:22]

this. When the green flag was clicked,

[132:24]

we wanted to say hello world onto the

[132:25]

screen. The code that corresponds to

[132:27]

that is roughly here. And indeed, notice

[132:29]

that the yellowish or oranges code lines

[132:31]

up with the when green flag clicked. The

[132:33]

purple code here lines up with the say

[132:35]

block. And the white code inside of here

[132:37]

roughly corresponds to what was in the

[132:38]

white oval that we kept using again and

[132:40]

again last week. So, let's do more of a

[132:42]

onetoone correspondence. And these

[132:43]

slides are deliberately designed to give

[132:45]

you again that sort of mental model of

[132:48]

taking same ideas from last week and

[132:50]

just changing the syntax this week

[132:52]

onward. So when we have a function like

[132:54]

this thing here and recall that a

[132:56]

function is just an action or verb. It

[132:58]

sort of accomplishes a small piece of

[132:59]

work in code in C specifically you're

[133:03]

going to type of course not a purple

[133:04]

puzzle piece but you're going to say the

[133:06]

word print. Well, more technically print

[133:08]

f where the f as we'll soon see means

[133:10]

format the printed output because this

[133:12]

is more powerful than just printing some

[133:14]

raw text alone. Then you can have

[133:15]

parentheses open and close left and

[133:17]

right. And notice that it's no accident

[133:19]

that MIT MIT chose an oval for their

[133:22]

input to functions because it roughly

[133:24]

looks like the start of a parenthesis

[133:27]

and parenthesis on left and right.

[133:29]

Meanwhile, what goes inside of the

[133:30]

parenthesis in the corresponding C code?

[133:32]

Well, at the end of the day, minimally

[133:34]

hello, world because that's literally

[133:36]

what we want to print to the screen. But

[133:38]

in C, unlike in Scratch, there's a bit

[133:40]

of overhead, a bit of additional syntax

[133:42]

that you just got to deal with to make

[133:44]

clear to the computer what you want to

[133:46]

print. In particular, you're going to

[133:48]

have to surround everything you want to

[133:50]

print with double quotes to make clear

[133:52]

that hello is not some special function

[133:55]

or variable or something else. It's

[133:56]

hello world is the English phrase that

[133:59]

you want to print. So double quote here,

[134:01]

double quote there means here's the

[134:03]

beginning and the end of what I want to

[134:04]

print. You're also curiously going to

[134:06]

put a backslash

[134:09]

in most cases at the end of the word or

[134:12]

words you want to print. We'll take that

[134:14]

away in a moment and see what it does.

[134:15]

And then lastly, and perhaps most

[134:17]

annoyingly in programming circles, you

[134:19]

have to finish your thought with a

[134:20]

semicolon. Much like in English, you

[134:22]

would finish most sentences with a

[134:23]

period instead. And the thing in the

[134:26]

thing about programming is with C in

[134:28]

particular, if you mess up almost any of

[134:30]

these details I just rattled off,

[134:32]

something's going to go wrong. And so

[134:34]

you're in good company. The very first

[134:35]

program you try to write or try to

[134:37]

compile, odds are it might not work

[134:39]

correctly because you'll develop over

[134:41]

time the muscle memory for spotting all

[134:42]

of these seemingly minor and actually

[134:45]

minor details, but that do matter to the

[134:48]

computer. All right. So if you're

[134:51]

familiar of course with the notation in

[134:52]

like mathematics of functions like a

[134:54]

function in code is really the same idea

[134:56]

as a function in math whereby the

[134:58]

function f takes some input for instance

[135:00]

x and generally produces some output. So

[135:03]

if you're coming more from that

[135:04]

background realize that what we're

[135:05]

really doing here is roughly the same

[135:07]

but in code recall that we can have

[135:09]

different types of output. So if this is

[135:11]

our grand mental model and say we've got

[135:14]

a function as inside of this black box

[135:16]

that takes arguments, that is to say as

[135:18]

its inputs, it can sometimes have side

[135:20]

effects. And recall that side effects

[135:21]

are often visual things that happen as a

[135:24]

result. They display on the screen.

[135:26]

Maybe it comes out of the speaker. It's

[135:27]

something generally ephemeral that just

[135:29]

happens. But it's not necessarily useful

[135:31]

in the same way as another type of

[135:33]

function that we'll return to in just a

[135:34]

bit. But last week, recall that we got

[135:36]

the cat with a speech bubble to uh

[135:38]

manifest on the screen and say hello

[135:40]

world in that speech bubble when the

[135:42]

input was hello world and the

[135:43]

corresponding function was instead say.

[135:46]

So let's see if we can't now tease apart

[135:50]

what the code we wrote is actually doing

[135:53]

for us bit by bit. So let me go back to

[135:55]

VS Code here and let me propose to break

[135:57]

this in a little way. Let me delete the

[135:59]

backslash n if only because at first

[136:01]

glance who knows or cares what that's

[136:03]

doing. Let's just get rid of it if we

[136:04]

don't understand it. I could now go back

[136:07]

down to my terminal window and I could

[136:09]

do dot /hello enter again. But there's

[136:12]

seemingly no change, which is good.

[136:15]

Doesn't seem like I broke it, but I've

[136:17]

kind of misled you here. Why?

[136:21]

Why did nothing seem to change?

[136:24]

I didn't recompile it. So, recall that

[136:26]

the compiler converts source code to

[136:29]

machine code, but I already did that a

[136:30]

couple of minutes ago. If I've changed

[136:31]

the source code, it stands to reason

[136:33]

that I need to recompile the code to

[136:36]

actually see the effects of that. So,

[136:38]

let me do that again. Make hello enter.

[136:41]

Nothing seems to have gone wrong, but

[136:43]

let me now dot /hello enter. And it's

[136:47]

subtle now. And in fact, let me go ahead

[136:48]

and zoom in. It's really just an

[136:50]

aesthetic bug in so far as functionally

[136:53]

the program is still technically

[136:54]

printing hello world. But what's

[136:57]

seemingly wrong? Or put another way,

[136:59]

what did the backs slashn apparently do?

[137:01]

Yeah.

[137:03]

>> Yeah. So, it's somehow giving me a new

[137:05]

line. And that's essentially what the

[137:06]

back slashn denotes is give me a new

[137:08]

line there. And why was I doing that?

[137:10]

Well, really just for the aesthetics.

[137:11]

Like if this dollar sign represents my

[137:13]

prompt where I type commands. If

[137:15]

anything, it just looks kind of stupid

[137:16]

that I finished a program over here and

[137:18]

then the prompt is on the same line. It

[137:20]

just looks wrong. Even though you could

[137:22]

sort of argue that was my intent, even

[137:23]

though in this case it wasn't. So, what

[137:25]

would the alternative be? Well, what

[137:27]

you're seeing here is what's actually

[137:29]

generally known as an escape sequence,

[137:31]

which are sort of uh special sequences

[137:35]

of symbols like backslash and n in this

[137:38]

case that do a little something unusual.

[137:40]

And here's just a non-exhaustive list of

[137:42]

some you'll encounter in the real world

[137:44]

and including in CS50. Back slashn moves

[137:47]

you to a new line. Back slash r is a

[137:49]

so-called carriage return. If you've

[137:50]

ever seen or used an old school

[137:52]

typewriter, this refers to the process

[137:53]

of bringing the typing head back to the

[137:56]

left end. So it sort of moves the cursor

[137:58]

horizontally as opposed to vertically.

[138:00]

This one's interesting. Back slash

[138:02]

double quote.

[138:04]

Why does there exist this pattern?

[138:08]

Back slash double quote. Yeah.

[138:10]

>> If you just write double quote, it

[138:12]

closes the

[138:14]

>> exactly. So recall that phrase we tried

[138:16]

to type uh print out like hello, world.

[138:19]

If for some reason you didn't want to

[138:20]

say hello world, but you wanted to say

[138:22]

some or like sort of snarkily like hello

[138:24]

world or something like that, well, you

[138:27]

can't put a quote a quote a quote and a

[138:29]

quote and expect the computer to know

[138:31]

which quote corresponds to what. It's

[138:33]

just arguably ambiguous. So if inside of

[138:36]

double quotes, you actually want to

[138:38]

print actual double quotes, this is a

[138:41]

escape sequence that tells the computer,

[138:43]

this is not some quote delim delineating

[138:46]

where my thought begins and ends. This

[138:48]

is literally a double quote. And we'll

[138:50]

see other situations in which a single

[138:52]

quote or apostrophe is the same. We'll

[138:53]

see crazy situations in which you want

[138:55]

to print a backslash, but backslash

[138:57]

already has some special meaning. So

[138:58]

there's solutions to all of these

[138:59]

problems. But let's not get too far into

[139:01]

the weeds here. But let me go back to

[139:03]

the code and propose what the

[139:04]

alternative otherwise might have been.

[139:06]

If I didn't know about backslashn, my

[139:09]

instinct to move the cursor to the next

[139:11]

line might have been literally to just

[139:12]

like hit enter or do something like

[139:15]

this, like move the double quote, move

[139:17]

the parenthesis, move the semicolon on

[139:19]

to the next line. But this should start

[139:20]

to rub you the wrong way. And indeed,

[139:22]

this violates a principle of most

[139:24]

programming languages and that most

[139:26]

programming languages are linebased. You

[139:29]

sort of start and finish your thought

[139:30]

ideally on the same line. And this runs

[139:33]

a foul of that. And two, even if you're

[139:35]

seeing code for the first time, assume

[139:37]

that this just looks stupid as well to

[139:40]

sort of move part of your thought to the

[139:41]

next line, it just looks a little

[139:43]

sloppy. And it is. So C and many other

[139:45]

languages, Python among them, solve this

[139:48]

by giving you these so-called escape

[139:49]

sequences. So if you want a new line

[139:51]

there, you do back slashn and you will

[139:53]

get your new line there. Now, that's a

[139:55]

bit of an overstatement what I said in

[139:57]

that sometimes lines of code will be so

[139:59]

long that they do wrap onto multiple

[140:01]

lines, but generally that's a convention

[140:03]

that we're going to try to avoid. All

[140:06]

right, what else could go wrong? Well,

[140:07]

let's do this. Let me go ahead and clear

[140:09]

my terminal window, which I can do by

[140:10]

hitting uh L or I can literally type

[140:13]

clear. And I'm going to frequently do

[140:15]

this just to keep the screen clear, even

[140:17]

though it has no functional impact. It's

[140:18]

just an aesthetic. Let me do something

[140:20]

else accidentally. Suppose I forgot to

[140:23]

finish my thought and I omitted the

[140:25]

semicolon, but otherwise the code is

[140:26]

perfect. Let me do make hello. Now

[140:29]

enter. Now we're going to see some

[140:32]

output that's a little more arcane. Let

[140:34]

me go ahead and scroll back up here to

[140:36]

make clear that what's just happened is

[140:37]

I ran make hello, but I didn't get back

[140:40]

to another prompt. I don't see

[140:41]

immediately a dollar sign because

[140:43]

there's an error message here that is

[140:44]

almost as long as the code I tried to

[140:46]

write. Not to worry. Let's see. Here is

[140:49]

the name of the file in which the

[140:50]

problem exists. Stands to reason that

[140:52]

it's in hello C. Here is the line uh

[140:55]

number in which the problem seems to

[140:57]

exist. Line five. And that's helpful

[140:59]

because it lines up with this. And then

[141:01]

if you're you care to count, this is the

[141:03]

29th character. So if I count from left

[141:05]

to right around character 29, something

[141:08]

is wrong. Something is missing. So it's

[141:10]

a pretty decent error message. In fact,

[141:12]

it even says expected semicolon after

[141:14]

expression. There's a little green

[141:16]

carrot symbol pointing me at the

[141:17]

mistake. So this is an again a this is

[141:20]

another value of the compiler. Not only

[141:22]

will does it know how to convert source

[141:24]

code to machine code, it's also pretty

[141:26]

good at finding mistakes in your code

[141:28]

and trying to draw your attention to

[141:30]

them. So how do I fix this? Well,

[141:32]

assuming you've understood the error

[141:34]

message at this point. Well, you just go

[141:35]

back in, add the semicolon. Let me go

[141:38]

back down to my terminal window. I'm

[141:39]

going to clear it just to clean up the

[141:41]

mess. Let me rerun make hello. And now

[141:44]

we are back in business. And indeed, if

[141:46]

I do /hello, I've got hello world back

[141:49]

on the screen. Well, let's make one

[141:52]

other mistake. Suppose that I forgot, as

[141:54]

you sometimes will, to include this line

[141:57]

at the top, which will make more sense

[141:58]

next week, but for now, let's just omit

[141:59]

it and dive right into the code. You

[142:01]

would think this is enough, just

[142:03]

printing out hello world. Well, here,

[142:05]

let me go back down to my terminal

[142:06]

window. Let me do make hello again now.

[142:09]

And I'm going to get a whole different

[142:10]

error message instead. So now problem is

[142:14]

still with hello C. That makes sense.

[142:16]

Line three. Okay. So somewhere in there

[142:19]

print f is suddenly the problem even

[142:21]

though the semicolon is back and the

[142:23]

back slashn is back. So let's keep

[142:24]

reading. Error call to undeclared

[142:27]

library function printf with type int.

[142:30]

And then this is a whole mouthful. So,

[142:32]

here is an example of an error message

[142:34]

that unless you're sort of conditioned

[142:36]

to know what this means and you've seen

[142:37]

it before, it's quite more cryptic and

[142:40]

unclear like what the solution to the

[142:42]

problem is, especially when the rest of

[142:43]

your code is truly correct. I've just

[142:46]

forgotten something stupid. But how can

[142:48]

I sort of think about this problem?

[142:50]

Well, it turns out that another feature

[142:53]

of C is that it comes with a bunch of

[142:55]

header files. A bunch of files whose

[142:58]

names don't end in C, but end inh. And

[143:02]

these so-called header files which end

[143:04]

inh are contain code that other people

[143:08]

wrote that you can use in your own

[143:10]

programs. So for instance in this

[143:13]

particular case a header file is giving

[143:15]

us access to what's more generally in

[143:17]

computing called a library. A library is

[143:20]

code someone else wrote that you can

[143:21]

use. And I actually used a library last

[143:23]

week when I did that import line and

[143:25]

mentioned open AAI the company. I was

[143:28]

actually using a library from that

[143:30]

company that I had automatically

[143:32]

downloaded and installed into my

[143:34]

programming environment in advance of

[143:35]

class because I don't know how to

[143:37]

implement a chatbot without standing on

[143:38]

their shoulders and using a lot of the

[143:40]

code they themselves wrote. Same idea

[143:42]

here. Even though print f is a feature

[143:45]

of C, if you want to use it, you have to

[143:48]

include that library by telling your

[143:51]

program to include the header file that

[143:55]

defines that function. And you only know

[143:57]

this by being taught it or looking it up

[143:58]

in a book or a reference. But in this

[144:00]

case, I wanted to use a header file

[144:03]

called standard io.h stdiodio.h.

[144:07]

Um, it is not studio.h.

[144:10]

This is a very common bug online. Um, if

[144:13]

you find yourself typing studio.h, typo,

[144:16]

it's standard io.h.

[144:18]

And in that file then is defined the

[144:22]

printf function. So, if I go back to my

[144:24]

code here, the solution to this problem

[144:26]

truly is to just undo the deletion I

[144:28]

made a moment ago. Because what line one

[144:30]

is now doing for me is it's telling the

[144:31]

compiler, oh, by the way, I didn't write

[144:33]

all the code that I'm about to use.

[144:36]

Please include the definition of print f

[144:39]

from this other file called standard

[144:41]

io.h. And again, you'd only know this by

[144:43]

looking it up in a reference, attending

[144:44]

a lecture or something like that. It's

[144:46]

not obvious otherwise, but these are the

[144:48]

kinds of things you very quickly look

[144:50]

up. So, where do you look them up? Well,

[144:52]

it turns out the ecosystem of C has, you

[144:55]

know, hundreds of books you can buy or

[144:57]

download, many, many, many websites.

[144:59]

Among them is one of CS50's own. And in

[145:01]

fact, the conventional way to look stuff

[145:04]

up for the programming language called C

[145:05]

is to look at the official manual pages

[145:08]

or man pages for short for the C

[145:11]

language. Unfortunately, many of them

[145:13]

were written decades ago and they were

[145:14]

certainly written by fairly advanced

[145:16]

programmers and not for a broad

[145:17]

audience. And so what we have done is

[145:19]

imported all of that freely available

[145:22]

documentation uh hosted it at our own

[145:24]

URL here manual.cs50.io

[145:27]

and we've essentially simplified it for

[145:28]

those less comfortable those of you who

[145:30]

might be less familiar with less

[145:32]

comfortable with technology and really

[145:33]

for most people who aren't used to

[145:35]

reading manual pages. It's just useful

[145:37]

to have it written in teaching assistant

[145:39]

like language instead. So for instance

[145:41]

if you go to a URL like this you'll see

[145:43]

CS50's documentation for this official

[145:46]

library standard io.a H that comes with

[145:49]

C itself. If you get a URL like this,

[145:51]

you can look up the documentation for

[145:52]

print F itself specifically. So for

[145:55]

instance, let me go ahead and just give

[145:56]

you a teaser for this. If I were to do

[145:58]

the same on my own computer, I might see

[146:01]

the CS50 manual pages here and you'll

[146:03]

see header file by header file a bunch

[146:06]

of frequently used functions in CS50.

[146:08]

We've also filtered the list down from a

[146:10]

massive list to much shorter list so

[146:12]

that you can sort of see what's most

[146:14]

likely useful to you. If you go to a

[146:16]

specific page like standard io.h, you'll

[146:18]

see for instance here just over a

[146:20]

halfozen functions that we won't touch

[146:22]

on today beyond print def, but that

[146:24]

we'll see in the class over time that

[146:25]

does useful stuff. For instance, printf

[146:28]

prints to the screen. And we'll see

[146:29]

other functions for opening files,

[146:31]

closing files, and the like because all

[146:33]

of that's related to standard IO input

[146:35]

and output. If I go to a specific man

[146:38]

page for uh this uh header file, you'll

[146:42]

see the standard formatting for these

[146:44]

pages. So, here's the name of the

[146:45]

function, print f, and it prints to the

[146:46]

screen. You'll see a synopsis, and this

[146:48]

indeed indicates we're in less

[146:50]

comfortable mode. If you want to see the

[146:51]

original, more arcane documentation,

[146:53]

just uncheck that, and you'll see the

[146:55]

original official documentation, but

[146:57]

you'll see a mention of like what header

[146:58]

file this function is defined in so that

[147:00]

you know what file to use in your own

[147:02]

code. You'll see a so-called prototype,

[147:04]

which is just the first line of code

[147:07]

from that function. More on that in just

[147:09]

a little bit. You'll see an English

[147:10]

description. You'll see example code.

[147:11]

Long story short, this is the

[147:13]

authoritative answer. And even though

[147:15]

you have access in this class to the

[147:17]

virtual rubber duck at CS50.AI and in

[147:20]

other forms of it that you'll soon see,

[147:22]

you should also have the tendency and

[147:24]

the in instinct moving forward to check

[147:26]

the official documentation. And all of

[147:28]

today's AIS are trained on things like

[147:31]

the official documentation. So that's

[147:33]

the source material that any of these

[147:35]

AI, the ducks among the duck among them

[147:37]

are actually relying on. But what we're

[147:40]

also going to see is that besides these

[147:42]

official functions, there's some that

[147:43]

CS50 itself has invented. We use these

[147:46]

really as training wheels for just the

[147:47]

first few weeks of the course and then

[147:49]

we take these training wheels off. But

[147:50]

the reality is in a language like C,

[147:53]

certain stuff is just really hard or

[147:55]

annoying to do. Certainly if you're

[147:56]

learning how to program for the very

[147:58]

first time or at least you are new to C.

[148:00]

We'll eventually show you how to do it

[148:02]

that way. But even if you just want to

[148:03]

get input from the user like a string of

[148:06]

text or a number of some sort, it's

[148:08]

generally not that easy to do in C, at

[148:10]

least in these early days. So for

[148:12]

instance, at this URL here, you can see

[148:13]

documentation for CS50's own library and

[148:16]

CS50's own header file, CS50.h. And

[148:19]

you'll see such functions in the

[148:20]

documentation as these get string, get

[148:23]

int, get char, and a bunch of others as

[148:25]

well. And we'll touch on those this

[148:27]

week. But it will ultimately be a way of

[148:30]

just getting useful work done quickly by

[148:33]

standing on our shoulders and actually

[148:36]

uh using functions we wrote to then

[148:38]

solve problems of interest to you. So

[148:42]

let's focus for instance on one of these

[148:43]

first. Get string. A string in

[148:46]

programming speak means text. Zero or

[148:49]

more characters of text like h e l l o

[148:51]

comma space w o r l d. That is a string

[148:55]

of text in computer speak. And it's

[148:57]

obviously not a number like 50. It's

[148:59]

actual text that you would type on the

[149:00]

keyboard. We'll see then what other

[149:02]

things we want to get. But with this pro

[149:04]

this function, we can start to replicate

[149:06]

another program that we implemented

[149:08]

pretty quickly last week in Scratch. So

[149:10]

recall that in Scratch, this one was a

[149:12]

little more interactive. I used another

[149:13]

blue puzzle piece ask to actually get

[149:16]

input from the user. And recall that

[149:18]

unlike the print defaf function today

[149:21]

and the say block last week, this time

[149:24]

we still have the same input output

[149:26]

model, but if we pass in arguments to a

[149:29]

function uh that we're about to see, you

[149:32]

can get back not just a side effect

[149:34]

sometimes, but a return value like a

[149:36]

useful reusable value like the person's

[149:39]

name as we'll soon see. All right, so

[149:41]

let's actually do this. If in Scratch

[149:43]

the equivalent was asking the user,

[149:45]

what's your name? asking them that and

[149:47]

then waiting for an answer that we can

[149:49]

store in a variable. Let me propose that

[149:51]

in C side by side it's going to look a

[149:53]

little something like this. Instead at

[149:55]

left we have the scratch block the ask

[149:57]

function here is the argument there too

[149:59]

and then it and wait just means it's

[150:01]

going to wait till the user finishes

[150:02]

typing. If I want to translate this to C

[150:05]

now today moving forward well it looks a

[150:07]

little something like this. The closest

[150:09]

analog in C thanks to CS50's library is

[150:13]

going to be a function called get

[150:14]

string. So there's no C function called

[150:16]

ask. And we deliberately named this

[150:18]

function get string just to make super

[150:20]

clear what it is you are getting. A

[150:22]

string of text in this case. And we've

[150:24]

got the parenthesis ready to go

[150:25]

indicative of this white oval for user

[150:27]

input. If I want to prompt the user with

[150:30]

that same phrase, what's your name?

[150:32]

Well, I can just put it inside of those

[150:34]

parenthesis. But what next do I need to

[150:36]

add around my user input? Um, you did

[150:40]

the quotation marks.

[150:41]

>> Yeah, I need the quotation marks just to

[150:43]

make clear that these aren't special

[150:45]

individual words. This is a whole phrase

[150:48]

that I want to be displayed to the user.

[150:50]

So, I'm going to indeed put double

[150:52]

quotes around everything. And this is

[150:53]

just an aesthetic. I don't in this case

[150:56]

want to bother moving the cursor to the

[150:57]

next line. Like, I want the user to see

[150:59]

the question and I want the cursor to

[151:01]

just stay there blinking waiting for

[151:03]

their prompt. But I don't want the

[151:04]

cursor to be right next to the question

[151:06]

mark. So, I'm deliberately just leaving

[151:08]

a single white space there just to kind

[151:10]

of scooch it over a bit so it looks a

[151:11]

little prettier, at least to my eye.

[151:13]

Now, we're not done yet because we need

[151:15]

to do something with this value. The get

[151:18]

string function, as we'll soon see, is

[151:19]

going to prompt the user for me to type

[151:21]

something in like my name. But where do

[151:23]

I want to put that? Well, MIT has the

[151:25]

answer put in a variable called answer.

[151:28]

And you can't rename that in Scratch.

[151:30]

It's just defined as answer. But in C,

[151:34]

what I'm going to need to do is

[151:35]

something like this. If you want to keep

[151:37]

return values around from a function,

[151:40]

you literally use an equal sign and then

[151:43]

to the left of it, you put the name of

[151:46]

the variable into which you want to put

[151:48]

that return value. So in mathematics, we

[151:51]

would use X, Y, and Z as our variables.

[151:53]

Again, in code, as in Scratch, you can

[151:55]

name your variables anything you want.

[151:57]

By convention, they should usually be

[151:58]

lowercase. They should not have spaces

[152:00]

therein, similar to file names. But this

[152:02]

is a pretty good analog now of what's

[152:04]

going on collectively here. But C is a

[152:07]

little more precise. It you can't just

[152:10]

give the variable a name. You need to

[152:12]

tell C or really the compiler what type

[152:16]

of value you want to put in this

[152:19]

variable. So if it's a string of text,

[152:20]

you put string. If it's a number, you're

[152:22]

going to put something else. But for

[152:23]

now, it's a string. Per the function's

[152:26]

name, it's going to give me a string.

[152:28]

Now, we're so close to finishing this

[152:30]

comparison. There's one detail missing.

[152:33]

What's still missing from the code here?

[152:36]

Yeah.

[152:37]

>> Yeah. So, we have to finish the thought

[152:39]

lastly with a semicolon. So, if you're

[152:41]

getting to sort of the point already,

[152:42]

like this is one of the reasons why we

[152:44]

start with Scratch, you sort of you get

[152:45]

the intuition pretty quickly. And even

[152:47]

though nothing on the right hand side is

[152:48]

particularly hard, there's just all

[152:49]

these stupid little details that you

[152:51]

have to ingrain in yourself over time.

[152:53]

In this case for C, but for many

[152:54]

programming languages, we're going to

[152:56]

see the similar paradigm. But among the

[152:58]

goals of the course too are to show you

[152:59]

how ultimately languages have been

[153:01]

evolving. And so one of the things we'll

[153:02]

see in Python in a few weeks time that

[153:04]

some of this syntax actually goes away

[153:07]

because over time humans have gotten

[153:09]

annoyed at older languages like this.

[153:10]

Like why the heck do I have to keep

[153:12]

putting a semicolon when it's clear that

[153:13]

I'm at the end of the line. So we'll see

[153:15]

among languages like Python we can get

[153:17]

rid of some of these same features. But

[153:19]

for now it's just a matter of

[153:21]

remembering what goes where. All right.

[153:23]

So, let's go ahead now and take that

[153:25]

same idea of converting Scratch to C and

[153:28]

actually do something with this code.

[153:30]

Let me go back to VS Code here. I'm

[153:32]

going to keep my file name the same, but

[153:33]

what you'll see on CS50's website is

[153:35]

that we'll add version numbers to each

[153:37]

of the examples that I'm typing out. So,

[153:39]

you can actually see the progression of

[153:40]

these programs, even though we're not

[153:42]

changing the name. And what I'm going to

[153:43]

go ahead and do here, for instance, in

[153:46]

hello C this time, is the following. I'm

[153:50]

going to go ahead and uh first get rid

[153:53]

of the single hello world. I'm going to

[153:55]

go up here and include this time cs50.h.

[153:58]

So, not one but two header files. And

[154:00]

then inside of my curly braces, inside

[154:02]

the so-called main function, as we'll

[154:05]

soon call it, I'm going to go ahead and

[154:06]

do this. Exactly the same line of code

[154:09]

as on the screen before, I'm going to

[154:11]

get a string prompting the user for

[154:13]

what's your name question mark space

[154:15]

close quote semicolon. And as an aside,

[154:19]

this will will soon see print on the

[154:21]

screen what's your name. So that implies

[154:24]

that the get string function is actually

[154:26]

using print f itself to print out that

[154:30]

message. I do not need to use print f to

[154:32]

display that message on the screen

[154:33]

because I read the documentation for

[154:35]

CS50's get string function and I just

[154:36]

know that it is using print f for me to

[154:39]

achieve that particular goal. Now let me

[154:41]

do something intuitive but not quite

[154:43]

correct. If I want to print out that

[154:45]

answer so that the expression is going

[154:47]

to be not hello world but hello David or

[154:49]

hello Kelly. Let me go ahead and say

[154:51]

hello,

[154:53]

answer back slashn to move the cursor

[154:56]

down as before. semicolon. So this is

[154:58]

not quite right. And even if you've

[155:00]

never programmed before, you can perhaps

[155:01]

see where this is erroneously going. Let

[155:04]

me remake the program because I've

[155:06]

changed the source code and I need new

[155:07]

machine code. Nothing seems to be wrong

[155:10]

aesthetic uh uh logic rather

[155:12]

syntactically. But if I do now dot

[155:14]

/hello and hit enter, you'll see I'm

[155:17]

being prompt. What's your name? So I'm

[155:18]

going to go ahead and type in David and

[155:20]

then hit enter. But when I do, if you

[155:23]

know where this is going, what am I

[155:24]

going to see instead?

[155:26]

>> Hello answer. And the computer's just

[155:29]

doing literally what I told it to do. I

[155:30]

said quote unquote print out hello

[155:32]

answer. But obviously that's not the

[155:34]

goal that I have in mind. So how do I

[155:36]

actually work around that? Well, what I

[155:38]

really need to do is achieve the

[155:40]

equivalent of this thing here, which we

[155:42]

did by stacking blocks in Scratch or

[155:44]

nesting them, if you will, one inside of

[155:46]

the other. So, I want to join the

[155:48]

expression hello, space, and that

[155:51]

answer. And it turns out in C, you can't

[155:52]

do it quite like this. Like, there isn't

[155:54]

an analog of the join function, at least

[155:56]

that we'll see today. So, we have to do

[155:59]

this a little bit differently. We can do

[156:01]

it though by maybe telling the computer,

[156:03]

we'll go ahead and print out hello,

[156:05]

comma, space, and then maybe we can give

[156:07]

it like a placeholder to plug in the

[156:10]

name once we know the name. Because when

[156:12]

I'm writing my code, I have no idea

[156:13]

who's going to play this game, me or

[156:15]

Kelly or someone else. So, what if we

[156:17]

use special syntax to indicate where I

[156:20]

want the person's name actually to go?

[156:22]

Let me propose that we now do this.

[156:25]

instead of printing out hello quote

[156:28]

unquote uh hello comma answer quote

[156:30]

unquote let's go ahead and start

[156:32]

printing out something and I got my

[156:34]

parenthesis ready to go and I did my

[156:35]

semicolon in advance this time I want to

[156:37]

somehow now say hello placeholder and

[156:40]

you would only know this by someone

[156:42]

having told you or a reference online

[156:44]

percent s is the placeholder for a

[156:46]

string that you don't know when you're

[156:48]

writing the code but when someone else

[156:50]

is running the code it will be filled in

[156:53]

and substituted for other input. So,

[156:55]

hello, percent s is the closest we can

[156:57]

get to this. I still need though some

[156:59]

other syntax. I still I do need those

[157:02]

quotes on the left and the right just to

[157:05]

be uh aesthetically pleasing. I'm going

[157:07]

to put a back slashn there at the end to

[157:09]

move the cursor, but now I've left room

[157:11]

in my parenthesis for one more thing.

[157:13]

And you can perhaps guess where I'm

[157:14]

going with this. Again, even if you've

[157:15]

never programmed before, this is telling

[157:17]

print f print out h e l o comma space

[157:21]

something. What should I probably pass

[157:24]

in to these parentheses as a second

[157:27]

input so that print f knows what that

[157:29]

something is?

[157:31]

Yeah,

[157:31]

>> the variable.

[157:32]

>> The variable name. So the variable in

[157:35]

which I have the user's name and indeed

[157:38]

the convention is to put a comma after

[157:40]

the quotes and then the name of the

[157:42]

variable that has the value you want to

[157:44]

be substituted for that placeholder. Now

[157:47]

notice there's a collision of syntax and

[157:49]

grammar here. The comma inside of the

[157:51]

quotes is just an English thing. Hello,

[157:53]

comma, so and so. The comma outside of

[157:56]

the quotes is meaningful to C because it

[157:59]

delineates which is the first input or

[158:01]

argument to left and which now is the

[158:04]

second. And we haven't seen this before

[158:05]

in C. Up until now, we've only been

[158:07]

passing one input, but you can pass in

[158:08]

two or three or four. Completely depends

[158:10]

on what the function is designed to

[158:13]

expect. So, let me put this all together

[158:16]

now. Let me go back to VS Code.

[158:18]

Previously, we were literally printing

[158:20]

out answer, but I can change answer to

[158:22]

percent s. I can move my cursor outside

[158:25]

of those quotes, comma, answer, because

[158:28]

that's the name I gave to that variable.

[158:30]

I can go back down to my terminal window

[158:31]

and clear it just to reduce clutter. Let

[158:34]

me do make hello one more time. Seems to

[158:36]

work. Dot /hello. Enter. DAV ID. And now

[158:41]

hello,

[158:43]

David is printed.

[158:46]

Okay, questions on any and all of that.

[158:49]

>> I was wondering with the header file,

[158:52]

where is it pulling from?

[158:53]

>> Good question. Where is it pulling these

[158:54]

header files from? So, what you are

[158:57]

seeing here is a graphical user

[158:59]

interface that's somewhere hosted in the

[159:00]

cloud at cs50.dev, the URL I mentioned

[159:03]

last week, and we're going to tease this

[159:04]

apart in just a moment. That software is

[159:08]

running on a computer, and that

[159:09]

computer's got a hard drive or a solid

[159:10]

state drive, like folders of storage.

[159:12]

Those files, CS50.h and standard.io.h

[159:15]

age and many more are pre-installed on

[159:17]

the server to which I have connected and

[159:19]

they're stored in a standard place so

[159:21]

that the compiler in particular knows

[159:22]

where to look for them and those are all

[159:24]

things we did in advance for you. Yeah.

[159:27]

>> Why is back slashn not create a new like

[159:30]

a new line?

[159:31]

>> Why does the back slashn not create a

[159:33]

new line? So it is back slashn is

[159:36]

essentially being printed here which has

[159:38]

the effect of pushing the dollar sign to

[159:40]

the next line. Otherwise, the dollar

[159:42]

sign would stay on that second to last

[159:44]

line. Other questions?

[159:46]

>> Why is there no backslash on this?

[159:49]

>> Good. Uh, why is there no backslash and

[159:51]

over here?

[159:52]

>> Good question. My choice as the

[159:54]

programmer. I just wanted to see the

[159:55]

sentence, what's your name? And I wanted

[159:57]

the user me to type my name immediately

[160:00]

after it like this. But I didn't have to

[160:03]

do it that way. I just wanted to show

[160:04]

you the difference.

[160:05]

>> Gotcha. And then also like just

[160:07]

generally when we're like doing the work

[160:09]

should we always write the like first

[160:11]

four lines.

[160:13]

>> Should you always write the first four?

[160:14]

Oh these. Yes. For today trust me do

[160:18]

this, do this, do this, do this. And

[160:20]

next week we'll understand even more

[160:21]

what those lines do. However, slight

[160:23]

caveat only use cs50.h if you're using

[160:26]

one of our functions. Clearly you don't

[160:28]

need cs50.h if you're just printing

[160:30]

something out as in the first example.

[160:32]

popular answer was 255

[1457:32]

which I think if we click once more

[1457:34]

we'll confirm was in fact the correct

[1457:36]

answer. So why is that and why is it not

[1457:39]

256? Well if we start counting from zero

[1457:41]

as we always have that's consuming one

[1457:44]

of the 256 possibilities. So the largest

[1457:47]

number that we can represent with that's

[1457:48]

8 bit and unsigned which means no

[1457:50]

negative numbers involved is indeed

[1457:52]

going to be 255.

[1457:54]

treasure that information now always.

[1457:57]

All right, next question from Kelly.

[1457:59]

Which issue is at the center of the year

[1458:02]

2038 problem, which hopefully you added

[1458:04]

to your Google calendars a few weeks

[1458:06]

back. Integer overflow, malicious

[1458:08]

inputs, SQL injection attacks, or memory

[1458:10]

leak.

[1458:13]

Which of those is at the core of the

[1458:15]

year 2038 problem?

[1458:19]

All right, let's go ahead and reveal the

[1458:24]

number one answer with 92% of you saying

[1458:27]

integer overflow is in fact correct

[1458:31]

because we're still in the habit of

[1458:32]

using 32-bit integers to keep track of

[1458:35]

time from the so-called epoch which was

[1458:36]

January 1st, 1970. And unfortunately, we

[1458:40]

humans aren't great at sort of planning

[1458:41]

ahead. And so we're going to run out of

[1458:43]

permutations of 32bits by a certain date

[1458:46]

in the year 2038 unless everyone

[1458:48]

upgrades their computers to 64-bit

[1458:51]

counters which thankfully most every

[1458:53]

piece of modern hardware nowadays is

[1458:55]

using already. Your Macs, your PCs, and

[1458:57]

your phones. So hopefully this will be

[1458:59]

really a non-event, but hopefully you'll

[1459:00]

think of us in CS50 in uh you know 10

[1459:03]

plus years when your Google calendar

[1459:05]

reminder goes off. Question three, which

[1459:08]

of the following is not a step of

[1459:10]

compiling? Linking, pre-processing,

[1459:12]

assembling, or interpreting?

[1459:16]

Bit more of a challenge. Which of these

[1459:18]

is not a step of compiling?

[1459:22]

All right, almost 200 responses coming

[1459:24]

in.

[1459:27]

All right, why don't we go ahead and

[1459:28]

reveal the most popular answer with 54%

[1459:31]

of you saying interpreting is in fact

[1459:35]

correct. Recall that we we talked about

[1459:37]

compiling. Compiling itself is just one

[1459:39]

of several steps. There is in fact the

[1459:41]

pre-processing step which takes care of

[1459:43]

any of the hash symbols in C that start

[1459:46]

with hash include hashdefine and the

[1459:48]

like. That's pre-processing. Uh there

[1459:50]

was then assembling or there was then

[1459:52]

compiling which actually compiled your

[1459:54]

code into assembly code. There was then

[1459:57]

the assembler which would actually take

[1460:00]

it down further to machine code and then

[1460:03]

linking

[1460:04]

29. This is for 29% of you. The linking

[1460:07]

step, recall, was taking your zeros and

[1460:09]

ones and combining them with say CS50's

[1460:11]

libraries zeros and ones and maybe the

[1460:13]

standard IO libraries zeros and ones,

[1460:15]

linking them all together to give you

[1460:17]

one executable program like hello uh

[1460:20]

itself. All right, next question. What

[1460:23]

does a pointer store? The name of a

[1460:25]

variable, the memory addresses of a

[1460:27]

value, the size of a value, or the value

[1460:30]

of a variable?

[1460:33]

Think for a moment.

[1460:36]

What does a pointer store?

[1460:41]

All right, about 200 responses in and

[1460:43]

yes, the memory address of a variable

[1460:45]

with 96% of you confirming as much. That

[1460:48]

is correct. Question five.

[1460:54]

What is the running time of linear

[1460:56]

search? Big O of 1, big O of N, big O of

[1460:58]

N squared, or big O of N log N? linear

[1461:03]

search running time.

[1461:08]

And recall that with something like

[1461:10]

search, you could get lucky. But if big

[1461:13]

O is the upper bound on our running

[1461:15]

time, you might not. You might hit the

[1461:16]

end of the list that you're searching.

[1461:19]

And so the running time of linear search

[1461:21]

is of course big O of N. It might be

[1461:26]

omega of one, but not big O of one. At

[1461:28]

least if we're considering what the

[1461:30]

worst case scenarios might be. All

[1461:31]

right, on to question six. Which what

[1461:33]

data structure follows the first in

[1461:35]

first out principle? A Q, a link list, a

[1461:39]

stack, or a hash table? First in, first

[1461:42]

out, aka FIFO.

[1461:46]

Which of these is FIFO?

[1461:51]

All right. First in, first out is in

[1461:55]

fact a Q as you would hope if you're

[1461:56]

getting in line for a restaurant, for a

[1461:58]

store. You'd hope that if you're the

[1462:00]

first one in line, you're going to be

[1462:02]

the first one out equitably speaking.

[1462:04]

And so it is in fact a queue. The

[1462:05]

opposite of that in some sense then

[1462:07]

would have been a stack whereby when you

[1462:08]

think about the cafeteria trays, the

[1462:10]

sort of first one in is actually the

[1462:12]

last one out. So LIFO instead for a

[1462:14]

stack. All right, question seven. Which

[1462:17]

operator returns the memory address of a

[1462:20]

variable? An asterisk, a dollar sign, an

[1462:23]

amperand, or a hyphen and a greater than

[1462:26]

sign.

[1462:28]

presumably in C

[1462:32]

which returns the memory address of a

[1462:35]

variable.

[1462:39]

All right, let's see what everyone

[1462:41]

thinks.

[1462:42]

So the most popular and correct answer

[1462:45]

is the amperand. This is the address of

[1462:47]

operator. The asterisk recall in most

[1462:49]

context is the opposite of that. That's

[1462:51]

the dreference operator. It's actually

[1462:52]

go to an address. Um this is not a thing

[1462:56]

in C. Uh this though is similar in

[1462:59]

spirit to a combination of the star

[1463:00]

operator and the dot operator which

[1463:03]

means to dreference and follow a pointer

[1463:05]

to something inside of a strct

[1463:07]

typically. All right, question eight.

[1463:10]

Which SQL command is used to remove

[1463:11]

duplicate rows from a result set?

[1463:14]

Remove, unique, distinct, or clean?

[1463:19]

We didn't spend a huge amount of time on

[1463:21]

these keywords,

[1463:23]

but only one of them applies here. A

[1463:27]

result set is just the answers that you

[1463:28]

get back when doing your select. And if

[1463:31]

you want to filter out duplicates, you

[1463:34]

can in fact say

[1463:37]

distinct is correct. Unique is also a

[1463:39]

keyword in SQL, but that is when you

[1463:41]

want to define in your schema that a

[1463:43]

columns values are going to be unique,

[1463:45]

like an email address column instead.

[1463:47]

Distinct is how you filter out

[1463:48]

duplicates in your selects. All right,

[1463:50]

question nine. We're past the halfway

[1463:52]

mark. What does an HTTP code of 418

[1463:56]

signify? Not found. I'm a teapot.

[1463:58]

Forbidden, unauthorized.

[1464:02]

418.

[1464:04]

This too. If you know this one, moving

[1464:06]

forward, you'll be considered among the

[1464:08]

[1464:10]

elite.

[1464:13]

answers are coming in a little slower,

[1464:14]

but I'm a teapot is correct, which is

[1464:16]

not actually a thing or useful

[1464:18]

technology. It was in fact an April

[1464:20]

Fool's joke years ago where a bunch of

[1464:22]

computer scientists got together in a

[1464:23]

room and wrote out an entire

[1464:25]

specification for what it means for a

[1464:27]

server to return 418. I'm a teapot. All

[1464:30]

right, number 10. Where does Malo

[1464:34]

dynamically allocate memory from? The

[1464:36]

heap, the stack, global variables, or

[1464:40]

assembly?

[1464:48]

All right,

[1464:50]

heap is in fact correct. That's the sort

[1464:52]

of top part of the memory. Even though

[1464:53]

top and bottom make no actual technical

[1464:55]

sense. It's just our artist rendition

[1464:56]

thereof. The stack recall is what is

[1464:58]

used when functions are being called.

[1465:00]

Every time a function is called, it gets

[1465:02]

a so-called frame on the stack. That's

[1465:03]

where your local variables and your

[1465:04]

arguments get put. But if in C you use

[1465:06]

maloc, it does in fact end up on the

[1465:08]

heap. in C. If you allocate memory with

[1465:11]

Maloc but forget to call free, what

[1465:13]

problem can occur? A memory leak,

[1465:14]

segmentation fault, stack overflow, or

[1465:17]

all of the above

[1465:20]

if you allocate memory with Maloc but

[1465:22]

forget to call free. What problem can

[1465:25]

occur?

[1465:31]

All right, most popular answer is in

[1465:34]

fact memory leak, which is correct. Um,

[1465:38]

you could imagine scenarios in which you

[1465:40]

also get a segmentation fault andor a

[1465:42]

stack overflow, but those aren't direct

[1465:43]

consequences of not calling free. That's

[1465:46]

generally the consequence of using too

[1465:48]

much memory, for instance, or in this

[1465:50]

case doing something wrong with your

[1465:51]

memory. So interrelated, yes, but in

[1465:53]

terms of not calling free for each

[1465:55]

maloc, this is what's going to happen by

[1465:57]

definition. All right, well done there.

[1466:00]

Next question, which is 12.

[1466:03]

What does this domain name give the web

[1466:06]

page of? Safetychool.org. Is it Harvard

[1466:09]

University? Is it Princeton University?

[1466:11]

Is it Yale University? Or Colombia

[1466:14]

University?

[1466:19]

All right. Recall that this was in the

[1466:21]

context of our HTTP redirections.

[1466:24]

Yes. Interesting. Yes. In fact, uh Yale

[1466:27]

University, some alum has been paying

[1466:29]

like $10 a year for like 20 years for

[1466:31]

this joke. safetychool.org if you visit

[1466:33]

it returns an HTTP 301 uh HTTP header

[1466:37]

which says the location of it is in fact

[1466:39]

yale.edu.

[1466:41]

All right 13 three to go. What is the

[1466:44]

purpose of DNS? Uh to encrypt data sent

[1466:46]

over the dark web to find the nearest

[1466:50]

coffee shop for you to protect your

[1466:52]

location against hackers or to translate

[1466:54]

domain names into IP addresses.

[1466:58]

What is the purpose of DNS? If helpful,

[1467:01]

domain name system.

[1467:06]

All right, about at the 200 mark and the

[1467:08]

correct answer is indeed domain names

[1467:10]

into IP addresses. That is a server that

[1467:13]

is on your home network, on your ISP's

[1467:15]

network, on your campus's network, your

[1467:17]

corporate network. That just answers

[1467:18]

questions like that for you. All right,

[1467:20]

second to last question. Which of the

[1467:22]

following is not a built-in SQL feature

[1467:24]

to tackle race conditions? Begin

[1467:27]

transaction, commit, roll back, or

[1467:29]

enroll?

[1467:35]

We talked ever so briefly about this in

[1467:37]

the context of ending up with too much

[1467:39]

milk. Recall

[1467:42]

and the correct answer is

[1467:45]

indeed in roll. All three of those even

[1467:47]

though you didn't have to use them for

[1467:49]

problem set seven or nine um are indeed

[1467:52]

uh features of SQL. Uh but enroll is not

[1467:56]

a thing. All right. And the very last

[1467:58]

question. and try to answer this as

[1468:00]

quickly as you can. What does Professor

[1468:02]

Men say at the beginning of every CS50

[1468:05]

lecture? Welcome to Harvard's computer

[1468:07]

science class. Hello everyone. Ready to

[1468:09]

code? All right, this is CS50

[1468:13]

or let's [laughter] get started with

[1468:15]

some programming.

[1468:19]

All of these questions were in fact

[1468:21]

written by you all.

[1468:25]

All right. And the correct answer, I'm

[1468:27]

pretty sure with 98% of you saying so,

[1468:30]

is all right, this is CS50. And all

[1468:33]

right, this was CS50. Cake is now

[1468:37]

served.

[1468:39]

[applause]

[1468:44]

[music]

[1469:01]

>> [music]

Raw Transcript

Full transcript without timestamps

If you want to learn about computer science and the art of programming, this course is where to start. CS50 is considered by many to be one of the best computer science courses in the world. This is a Harvard University course taught by Dr. David Men and we are proud to bring it to the free code camp channel. Throughout a series of lectures, Dr. Men will teach you how to think algorithmically and solve problems efficiently. And make sure to check the description for a lot of extra resources that go along with the course. >> [music] [music] >> All right. This is [applause] This is CS50, Harvard University's introduction to the intellectual enterprises of computer science and the arts of programming. My name is David Men and this is week zero. And by the end of today, you'll know not only what these light bulbs here spell, but so much more. But why don't we start first with the uh the elephant or the elephant in the room. That is artificial intelligence, which is seemingly everywhere over the past few years. And it's been said that it's going to change programming. And that's absolutely the case. It's been that way actually for the past several years is only going to get to be the case all the more. But this is an incredibly exciting time. This is actually a good thing I do think in so far as now using AI in any number of forms. You can ask the computer to help solve some problem for you. You can find some bug or mistake in your code. Better still increasingly you can tell the AI what additional features you want to add to your software. And this is huge because even in industry for years, humans have been programming in some form for decades, building products and solutions to problems, the reality is that you and I as humans have long been the bottleneck. There's only so many hours in the day. There's only so many people on your team or in your company and there's so many more bugs that you want to solve and so many more features that you want to implement. But at the same time, you still really need to understand the fundamentals. And indeed, a class like this CS50 has never been about teaching you how to program. Like that's actually one of the side effects of taking a class like this. But the overarching goal is to teach you how to think, how to take input and produce correct output and how to master these and other tools. And so by the end of the semester, not only you will be not only will you be acquainted with languages like Scratch, which we'll touch on today if you've not seen it already, languages like C and Python and SQL, HTML, CSS, and JavaScript. You'll be able to teach yourself new things ultimately, and ultimately be able to tell computers increasingly what it is you want it to do. But you'll still be in the driver's seat, so to speak. You'll be the pilot. You'll be the conductor. Whatever your preferred metaphor is. And that's what I think is so empowering still about learning introductory material, foundational material, because you'll know what you're ultimately talking about and what you can in fact solve. And we've been through this before, like when calculators came out. It's still valuable, I dare say, all these years later to still know how to do addition and subtraction and whatnot. And yet, I think back on some of my own math classes. I remember learning so many darn ways in college how to take derivatives and integrals. And after like the six process of that, I sort of realized, okay, I get it. I get the idea. Do I really need to know this many ways? And here too, with AI and with code, can you increasingly sort of master the ideas and then lean on a a co-pilot assistant to actually help you solve those same problems. So, let's do some of this ourselves here. In fact, just to give you a teaser of what you'll be able to do yourselves before long, let me go ahead and open up a little something called Visual Studio Code, aka VS Code for short. This is popular largely open- source or free software that's used by real world people in industry to write code. And it's essentially a text editor similar to Notepad if you're familiar with that or text edit kind of like Google Docs but no boldf facing and underlining and and things like that that you'd find in word processing programs. And this is CS50's version thereof. We're going to introduce you to this all the more next week. But for now, let's just give you a taste of what you can do with an environment like this. So I'm going to switch over to this program already running VS Code. And in this uh bottom of the screen, you're going to see a so-called terminal window. Again, more on that next week. But it's in this terminal window that I can write commands that tells the computer what I want it to do. For instance, let's suppose just for the sake of discussion that I want to make my own chatbot, not chat GPT or Gemini and Claude, like let's make our own in some sense. So, I'm going to code up a program called chat.py. And you might be familiar that I using a language here.py is it's just called Python. And if unfamiliar, you're in good company. You'll learn that too within a few weeks. And at the top of the file here, I can write my code. And at the bottom of the file of the window here, I can run my code. So, here's how relatively easy it is nowadays to write even your own chatbot using the AI technologies that we already have. I'm going to go ahead and type a command like import uh uh I'm going to go ahead and type the following from OpenAI. import open AI. We'll learn what this means ultimately, but what I'm going to do is write my own program on top of an API, application programming interface that someone else provides, a big company called OpenAI, and they're providing features and functionality that now I can write code against. I'm going to create a so-called client, which is to say a program of my own that's going to use this OpenAI software. And then I'm going to go ahead and ask this software for a response. And I'm going to set that equal to client.responses.create whatever all that means. And then inside of these parenthesis I'm going to say the following. The input I want to give to this underlying API is quote unquote something like in one sentence what is CS50? Much like I would ask chatpt itself. If you're familiar with things like chat GPT and AI more generally nowadays, you know there's this thing called models which are like statistical models that ultimately drive what the AIs can do. I'm going to go ahead and say model equals quote unquote gpt5 which is the latest and greatest version at least as of today. Now down in my terminal window I'm going to run a different command python of chat.py and so long as I have made no typographical errors in this program I should be able to ask openai not with chatgpt.com but with my own code for the answer to some question. But I want to know what the answer to that question is. So, I actually want to print out that response by saying print response output text. In other words, these 10 lines, and it's not even 10 lines because a few of them are blank, I've implemented my own chatbot that at the moment is hard-coded that is permanently configured to only answer one question for me. And let's see, with the cross of the fingers, CS50 is Harvard University's introductory computer science course, the intellectual enterprises of computer science and the art of programming. weirdly familiar covering problems solving algorithms, data structures, and more using languages like C, Python, and SQL. Okay, interesting. But let's make the program itself more dynamic. Suppose you wanted to write code that actually asks the human what their question is because very quickly might we want to learn something more than just this one question. So up here, I'm going to go and change my code and type something like this. Type prompt equals input with parenthesis. More on this another time, too. But what I'm going to ask the user for is to give me an actual prompt. That is a question that I want this AI to answer. And down here, what you'll notice, even if you've never programmed before, is that I can do something somewhat intuitive in so far as line five is now asking the human for input. Let's just stipulate that this equal sign means store that answer in a variable called prompt where variables just like in math x, y, or z. Let's go ahead and store that in prompt. So the input I want to give to open ai now is that actual prompt. So, it's a placeholder containing whatever keystrokes the human typed in. If I now run that same command again, python of chat.py, hit enter, cross my fingers, I'll see now dynamic prompting. So, what's a question I might want to ask? Well, let's just say it again. In one sentence, whoops, in one sentence, what is CS50? Question mark. Enter. And now the answer comes back as probably roughly the same but a little bit different a variant thereof. But maybe we can distill this even more succinctly. How about let's run it again. Python of chat.py and let's say in one word what is CS50 and see if the underlying AI obliges. And after a pause course in a word. So that's not all that incorrect. And maybe we can have a little fun with this. Now how about in one word which is which is better maybe Harvard or Stanford question mark hope you picked right let's see the answer is depends okay so would not in fact oblige but notice what I keep doing in this code I keep providing a prompt as the human like in one sentence in one word well if you want the AI to behave in a certain A why don't we just tell the underlying system to behave in that way so I the human don't have to keep asking it in one sentence in one sentence in one word so we can actually introduce one other feature that you'll hear discussed in industry nowadays which is not only a prompt from the user which I'm going to now temporarily rename to user prompt just to make clear it's coming from the user I'm going to also give our what's called a system prompt by setting this equal to some standardized instructions that I want the AI to respect like limit your answer to one sentence, quote unquote. And now, in addition to passing in as input the user prompt, I'm going to actually tell Open III to use these instructions coming from this other variable called system prompt. So, in other words, I'm still using the same underlying service, but I'm handing it now not only what the user typed in, but also this standardized text limit your answer to one sentence. So, the human like me doesn't have to do that anymore. Let's now go back to my terminal. run Python of chat.py Pi once more and this time we'll be prompted but now I can just ask what is CS50 question mark and I'll likely get a correct and similar answer to before and indeed it's Harvard University's flagship introductory computer science course dot dot dot so seems spot on too but now we can have some fun with this too and you might know that these GPTs nowadays have sort of personalities you can make them obliged to behave in one way or another why don't we go into our system prompt here and say something silly like pretend You're a cat. And now let's go back to the prompt one final time. Run Python of chat.py. Prompt again will be say what is CS50? And with a final flourish of hitting enter, what do we get back? CS50 is Harvard University's introductory computer science course teaching programming algorithms, data structures, and problem solving. And it's available free online. Meow. So that [snorts] was enough to coersse this particular behavior. So this is to say that with programming, you have the ability in like 10 lines of text, not all of which you might understand yet, but that's the whole point of a class like this to build fairly powerful things, maybe silly things like this, but in fact, it's using these same primitives that CS50 has its own virtual rubber duck. And we'll talk more about this in the weeks to come, but long story short, in the world of programming, it's kind of a thing to keep a rubber duck literally on your desk or really any inanimate cute object like this because when you are struggling with some problem, some bug or mistake in your code and you don't have a friend, a teaching assistant, a parent or someone else who's more knowledgeable than you about code, well, you literally are encouraged in programming circles to like talk to the rubber duck. And it's through that process of just verbalizing your confusion and organizing your thoughts enough to convey it to another person or duck in this case that so often that proverbial light bulb goes off and you realize ah I'm being an idiot now I hear in my own thoughts the ill logic or the mistake I'm making and you solve that problem as well. So CS50 drawing inspiration from this will give to you a virtual duck in computer form and in fact among the other URLs you'll use over the course of the semester is that here cs50.ai AI which is also built into that previous URL cs50.dev dev whereby these are the AIS you can use in CS50 to solve problems and you are encouraged to do so as you'll see in the course syllabus it is not reasonable it is not allowed to use AI based software other than CS50's own be it claw Gemini chat GPT or the like but it is reasonable and very much encouraged along the way to turn not only to humans like me your teaching assistant and others in the class but to CS50's own AI based software and what you'll find is that this virtual duck is designed to behave as close to a good human tutor as you might expect from an actual human in the real world knows about CS50 knows how to lead you to a solution ideally without simply spoiling it and providing it outright. So with that said that's sort of the endgame to be able to write code like that and more. But let's really start back at the beginning and see how we can't get from zeros and ones that computers speak all the way back to artificial intelligence. So computer science is the in the name of the course computer science 50. But what is that? Well, it's really just the study of information. How do you represent it? How do you process it? And very much gerine to computer science is what the world calls computational thinking, which is just the application of ideas from computer science or CS to problems generally in the real world. And in fact, that's ultimately, I dare say, what computer science really is. It's about problem solving. And even though we use computers, you learn how to program along the way, these are really just tools and methodologies that you can leverage to solve problems. Now, what does that mean? Well, a problem is perhaps most easily distilled into a simple picture like this. We've got some input, which is like the problem we want to solve, and the output, which is the goal we want, the solution there, too. And then somewhere in the middle here is the proverbial black box, the sort of secret sauce that gets that input from output. So, this then I would say is in essence is problem solving and thus computer science. But we have to agree, especially if we're going to use devices, Macs, PCs, phones, whatever. How do we all represent information, the inputs and the outputs, in some standardized way? Is it with English? Is it with something else? Well, you all probably know, even if you're not computer people, that at the end of the day, computers somehow use zeros and one entirely. That is their entire alphabet. And in fact, you might be familiar already with certain such systems. So the unary uh notation, which means you essentially use single digits like fingers on your hand. For instance, unary aka base one is something you can do on your own human hand. So for instance, with one human hand, how high can I count? >> All right, so hopefully 1 2 3 4 5 and if you want to count to six and uh to 11 and 10 and so forth, you need to, you know, take out another hand or your toes or the like because it's fairly limiting. But if I think a little harder, instead of just using unary, what if I use a different system instead? What about something like binary? Well, how high if you think a little harder can you count on one human hand? So 31 says someone who studied computer science before. But why is that? It's kind of hard to imagine, right? Because 1 2 3 4 5 seems to be the five possible patterns. But that's only when you're looking at the totality of fingers that are actually up. Five in total or four in total or one or the like. But what if we take into account the pattern of fingers that are up and we just standardize what each of those fingers represent? So maybe we all agree like a good computer would too that maybe no fingers up means the number zero. And if we want to count to one, let's go with the obvious. This is now one. But instead of two being this, which was my first instinct, maybe two can just be this. A single second finger up like this. And that means we could now use two fingers up to represent three. I'll propose we can use just one middle finger up to offend everyone, but represent four. I could maybe use these two fingers with some difficulty to represent five, six, seven. I'm already up to seven having used only three fingers. And in fact, if we keep going higher and higher, I bet I can get as high as 31 for 32 possible combinations, but the first one was zero. So that's as high as we can count. So we'll make this connection in just a moment. But what I started to do there is something called base 2. Instead of just having fingers up or fingers down, I'm taking into account the positions of those fingers and giving meaning to like this finger here, this finger here, this finger here and so forth. Different weights if you will. So the binary system is indeed all computers understand. And you might be familiar with some terminology here. Binary digit is not really something anyone really says, but the shorthand for that is going to be bit. So if you've heard of bits and we'll soon see bytes and then kilobytes and megabytes and gigabytes and terabytes and more. This just refers to a bit meaning a single binary digit either a zero or a one. A zero is perhaps most simply represented by just like turning maybe keeping a finger down or in the world of computers which have access to electricity be it from the wall or maybe a battery. You know what we could do? We could just decide sort of universally that when a light bulb is off, that thing represents a zero. And when the light bulb is on, that thing's going to represent a one instead. Now, why is this? Well, electricity is such a simple thing, right? It's either flowing or it's not. And we don't even have to therefore worry about how much of it is flowing. And if you're vaguely remember a little bit about voltage, we can sort of be like zero volts, nothing's there available for us. Or maybe it's 5 volts or something else in between. But what's nice about binary only using zeros and ones is that it maps really nicely to the real world by like throwing a light switch on and off. You can represent information by just using a little bit of electricity or the lack thereof. So what do I mean by this? Well, suppose we want to start counting using binary zeros and ones only. Well, let's think of them metaphorically as like akin to these light bulbs here. And in fact, let me grab a few of these light bulbs and let me propose that if we want to represent the number zero, well, it stands to reason that here single light bulb that is off can be agreed upon as representing zero. Now, in practice, computers don't have little light bulbs inside, but they do have little switches inside. Millions of tiny little things called transistors that if turned on can allow it to capture a little bit of electricity and effectively turn on a metaphorical bulb or the switch can go off. the transistor can go off and therefore let the electricity dissipate and you have just now a zero. Unfortunately, even though I can let some electricity, there's the battery I mentioned is required. Even though we might have some electricity available to us, I can therefore count to one. But how do I go about counting? [snorts] Hardware problem. How do I go about counting higher than one with just a light bulb? Yeah. So, I need more of them. So, let me grab another one here. And now I could put it next to it. And this two I'll claim is just still the number one. But if I want to turn two of them on, well, that would mean I could count to two. And if I maybe grab another one, now I can count as high as three. But wait a minute. I'm doing something wrong because with three human fingers, how high was they able to count? So, seven in total, starting at zero. So, I've done something wrong here. But let me be a little more clever than about the pattern that I'm actually using. Perhaps this can still be one. But just like my finger went up and only one finger in the second version of this, this can be what we represent as two. Which one do I want to turn on as three? Your left or your right? >> So you're right because now this matches what I was doing with my fingers a moment ago. And I claimed we could represent three like this. If we want to represent four, that's fine. We have to turn that off, this off, and this on. And that's somehow four. And let's go all the way up to seven. Which ones need to be on to represent the number seven? All right. So, all of them here. Now, if you're not among those who just sort of naturally said all of them, like what the heck is going on? How do half the people in this room know what these patterns are supposed to be? Well, maybe you're remembering what I did with my fingers. But it turns out you're already pretty familiar with systems like this, even if you might not have put a name to it. So in the human world, the real world, most of us deal every day with the so-called base 10 system, otherwise known as decimal deck implying 10 because in the decimal system you have 10 digits available to you, 0 through 9. In the binary system, we only had two by implying two. So 0 and one and unary we had just one, a single digit there or not. So in the decimal system, we just have more of a vocabulary to play with. And yet you and I have been doing this since grade school. So this is obviously the number 123. But why? It's technically just three symbols. 1 2 3. But most of us, your mind ego goes, okay, 123. Pretty obvious, pretty natural. But at some point, you like me were probably taught that this is the one's place and this is the 10's place and this is the 100's place and so forth. And the reason that this pattern of symbols 1 2 3 is 123 is that we're all doing some quick mental math and realizing well that's 100* 1 + 10 * 2 + 1 * 3. Oh, okay. There's how we get 100 + 20 + 3 gives us the number we all know mathematically is 123. Well, it turns out whether you're using decimal or binary or other base systems that we'll talk about later in the course, the system is still fundamentally the same. Let's kind of generalize this away. Here's a three-digit number in some base system specifically in decimal. And I know that only because of the placeholders that I've got on top of each of these numbers. But if we do a little bit of math here, 1 10 100 1,000 10,000 and so forth. What's the pattern? Well, technically this is 10^ the 0 10 the 1 10 the 2 and so forth. And we're using 10 because we can use as many as 10 digits under each of those columns. But if we take some of those digits away and go from decimal down to binary, the motivation being it's way easier for a computer to distinguish electricity being on or off than coming up with like 10 unique levels of electricity to distinguish among. You could do it. It would be annoying and difficult to build in hardware. You could do it so much simpler to just say on and off. It's a nice simple world that way. So let's change the base from 10 to two. And what does this get us? Well, if we now do undo the math, that's 2 to the 0 is 1. 2 to the 1 is 2. 2 to the 2 is 4. So the ma the mental math is now about to be the same, but the columns represent something a little bit different. So for instance, if I turn all of these off again, such that I've got off, off off, otherwise known as 0 0, it's zero because it's 4 * 0 + 2 * 0 + 1 * 0 still gives me zero. By contrast, if I turn on maybe just this one all the way over on the left, well, that's four times one because on represents one and off represents 0 plus 2 * 0 + 1 * 0, that gives me four. And if I turn both of these on, such that all three of them are now on, on on aka one, one, one, that's 4 * 1 + 2 * 1 + 1 * 1. That then gives me seven. And we can keep adding more and more bits to this. In fact, if we go all the way up uh numerically, here's how we would represent in binary the number you and I know is zero. Here's how we would represent one. Here's how we would represent two and three and four and five. And you can kind of see in your mind's eye now because I only have zeros and ones and no twos or threes, not to mention nines, I'm essentially going to be carrying a one in a moment if we were to be doing some math. So to go from five to six, that's why the one ends up in the middle column. To go to seven here gives us now 1 one or on on on. How do I represent eight using ones and zeros? Yeah, >> we need to add another digit. >> Yeah. So we're going to need to add another digit. We need to throw hardware at the problem using an additional digit so that we actually have a column representing eight. Now, as an aside, and we'll talk about this before long, if you don't have an additional digit available, if your computer doesn't have enough memory, so to speak, you might accidentally count from 0 1 2 3 4 5 6 7 and then accidentally end up back at zero. Because if there's no room to store the fourth bit, well, all you have is part of the number. And this is going to create all sorts of problems then ultimately in the real world. So let me go ahead and put these back and propose that we have a system now. If you agree to sort of count numbers in this way via which we can represent information in some standard way and all the device underneath the hood needs is a bit of electricity to make this work. It's got to be able to turn things on aka use some transistors and it's got to be able to turn those things off so as to represent zeros instead of ones. But the reality is like two bits, three bits, four bits aren't very useful in the real world because even with three bits you can count to seven, with four you can count to 15. These aren't very big numbers. So it tends to be more common to actually use units of measure of eight bits at a time. A bite is just that one bite is eight bits. So if you've ever used the vernacular of kilobytes, megabytes, gigabytes, that's just referring to some number of bits. But eight of them together compose one individual bite. So here for instance is a bite worth of bits. Eight of them total. I've added all the additional placeholders. And what number does this represent in decimal even though you're looking at eight binary digits? >> Just zero cuz like literally every column is a zero. Now this is a bit more of mental math but unless you know it already. What if I change all of the zeros to ones? I turn all eight light bulbs on. What number is this? >> Yeah. So 255. Now some of those of you who didn't get that instantly, that's fine. You could certainly do the math manually. I dare say some of you have some prior knowledge of how to do this sort of system. But 255 means that if you start counting at zero and you go all the way up to 255, okay, that's 256 total possibilities once you include zero in the total number of patterns of zeros and ones. And this is just going to be one of these common numbers in computer science. 256. Why? because it's referring to eight of something. 2 to the 8 gives you 256. And so you're going to commonly see certain values like that. 256. Back in the day, computers could only show 256 colors on the screen. Certain graphics formats nowadays that you might download can only use as many as 256 colors because, as we'll see, they're only using, for instance, eight bits, and therefore they can only represent so many colors of the rainbow as a result. So this then is how we might go from just zeros and ones electricity inside of a computer to storing actual numbers with which we're familiar. And honestly we can go higher than 255. What do you need to count higher than 255? A 9th bit, a 10th bit, an 11th bit and so forth. And it turns out common conventions nowadays and we'll see this in code too is to use as many as 32 bits at a time. So that's a good chunk of bits. And anyone want to ballpark how high you can count count if you've got 32 bits available to you? Oh, fewer people now. Yeah, in the back. >> Yeah. So, it's roughly 4 billion. And it's technically two billion if you also want to represent negative numbers, but we'll revisit that question. But 2 to the 32nd power is roughly 4 billion. However, nowadays it's even more common with the Macs and PCs you might have on your laps and even your phones nowadays to use 64 bits, which is a big enough number that I'm not even sure offhand how to pronounce it. That's a lot of permutations. That's 2 to the 64 possible permutations, but that's increasingly common place. And as an aside, just to dovetail things with our discussion of AI, among the reasons that we're living through over these past few years, especially this crazy interesting time of AI, is because computers have been getting so much faster, exponentially so over time, they have so much more memory available to them. There's so much data out there on the internet in particular to train these models that it's an interesting confluence of hardware now actually meeting the mathematics and statistics that we'll talk about later in the class that ultimately make tools like the cat we just built possible. But of course computers are not all math and in fact we'll use very little math per se in this class. And so let's move away pretty quickly from just zeros and ones and talk about letters of the alphabet. Say in English here is the letter A. Suppose you want to use this letter in an email, a text message, or any other program. What is the computer doing underneath the hood? How can the computer store a capital letter A in English? If at the end of the day, all the computer has access to is a source of electricity from the wall or from a battery and it has a lot of switches that it can turn on and off and treat the electricity in units of 8 or 32 or 64 or whatever. How might a computer represent a letter A? >> Yeah, we need to give it an identity so to speak as an integer. In other words, at the end of the day, if your entire canvas, so to speak, consists only of zeros and ones. Like that is going to be the answer to every question today. You only have zeros and ones as the solution to these problems. We just need to agree what pattern of zeros and ones and therefore what integer, what number shall be used to represent the letter A. And hopefully when we look at that pattern of zeros and ones in the right context, we'll indeed see it as an A. So if we look inside of a computer so to speak in the context of like a text messaging program or a word processor or anything like that, that pattern shall be interpreted hopefully as a capital letter A. But if I open up Mac OS's or Windows or my phone's calculator program, I would want that same pattern of zeros and ones to be interpreted instead as a number. If I open up Photoshop, as we'll soon see, I want that same pattern of zeros and ones to be interpreted as a color presumably, not to mention videos and sound and so forth, but it's all just zeros and ones. And so, even though I, when writing that chat program a few minutes ago, didn't have to worry about telling the computer, oh, this is text, this is a number, this is something else. We'll see as we write code ourselves that you as the programmer will have control over telling the computer how to treat some pattern of zeros and ones telling it this is a number, this is a color, this is a letter or something else. Um, how do we represent the letter A? Well, turns out a bunch of humans in a room years ago decided ah this pattern of zeros and ones shall be known globally as a capital letter English A. What is that number if you do the quick mental math? So indeed 65 because we had a one in the 64's place and a one in the onees place. So 65 that's just sort of it. It would have been nice if it were just the number one or maybe the number zero. But at least after the capital letter A, they kept things consistent such that if you want to represent a letter B, it's going to be 66. Capital letter C, it's going to be 67. Why? Because the humans in this room, a bunch of Americans at the time, standardized on what's called ASKI, the American standard code for information interchange. doesn't matter what the acronym represents, but it was just a mapping. Someone on a piece of paper essentially started writing down letters of the alphabet and corresponding numbers so that computers subsequently could all speak that same standard representation. And here's an excerpt thereof. In this case, we're seeing seven bits worth, but eventually we ended up using eight bits in total to represent letters. And some of these are fairly cryptic. Maybe more on those another time. But down here, if we highlight just one column, we'll see that indeed on this cheat sheet, 65 is capital A, 66 is B, 67 is C, and so forth. So, why don't we do a little exercise here? What pattern of zeros and ones do I see here? I've got three bytes, so three sets of eight bits. And even though there's no placeholders now over the columns, what is this number? It's 60. Yeah. Yeah. So, we got the ones, twos, fours, 8s, uh, 16, 32, 64s column. So, indeed, this is going to be the number 72. 72. This is not what computer scientists spend their day doing. This is just to reinforce what it is we just looked at. And I'll spoil it. The rest of these numbers are 72 73 33. And anyone in this room could have done that if you took out a piece of paper, figured out what the columns are, and just do a bit of quick or mental or written math. But this is to say, suppose that you just got a text message or an email that if you had the ability to look underneath the hood of the computer and see what pattern of zeros and ones did you just receive over the internet. Suppose that pattern of zeros and ones was three bytes of bits, which when you do the math are the numbers 72, 73, 33. Well, here's the cheat sheet again. What message did you just get? >> Yeah. So, it's high. Why? Because 72 is H and 73 is I. Now, some of you said hi fairly emphatically. Why? Well, 33 turns out, and you wouldn't know this unless you looked it up or someone told you, is an exclamation point. So, literally, if you were to text someone like right now, if you haven't already, hi exclamation point in all caps, you would essentially be sending three bytes of information somehow over the internet to that recipient. And because their phone similarly understands ASI because it was programmed years ago to do so, it knows to show you hi exclamation point and not a number three numbers no less or colors or something else altogether. So here we then have hi three digits in a row here. Um what else is worth noting here? Well, there's some fun sort of trivia embedded even in this cheat sheet. So here again is a b cde e fg and so forth. 65 on down. Let me just highlight over here the lowercase letters 97 98 99 and so forth. If I go back and forth, does anyone notice the consistent pattern between these two? >> Yeah. So, the lowercase letters are 32 away from the uppercase letters. Well, how do we know that? Well, 97 - 65 is Yeah. 32. Uh 98 - 66 is okay. 32. And that pattern continues. What does this mean? Well, computers know how to do this. Most normal humans don't need this information. But what it means is if you are representing in binary with your transistors on and off representing some pattern and this is the pattern representing capital letter A, which is why we have a one in the 64's place and a one in the onees place. How does a computer go about lowercasing this same letter? Yeah, >> perfect. All the computer has to do is change this one bit in the 32's place to a one because that has the effect mathematically per our discussion of adding the number 32 to whatever it is. So it turns out you can force text from uppercase to lowerase or back by just changing a single bit inside of that pattern of eight bits in total. All right, why don't we maybe reinforce this with another quick exercise? We have an opportunity perhaps here for um maybe to give you some stress balls right at the very start of class. Could we get eight volunteers to come up on stage? Maybe over here and over here and uh over here on the left. Let me go all the way on the right. Uh let's see. Okay, the high hand here. The the hand that's highest there. Yes, we're making eye contact. How about all the way? Wait, let's see. Let's go here in the crimson sweatshirt here. And how about in the the white shirt here? Come on up. Did I count correctly? Let's see. Come on down. The eight of you. I didn't count right, did I? 1 2 3 4 5 6. It's ironic that I'm not counting correctly. Eight here. How about on the left in gray? Okay. Oh, and uh Okay. In black here. Come on down. All right. Hopefully, this is eight. 1 2 3 4 5 6 7. I pretty. Okay. Eight. There we go. All right. So, let's go ahead and do the following exercise. I've got some sheets of paper preprinted here. If each of you indeed want to do exactly what you're doing and line up from left to right, each of you is going to represent a placeholder essentially. So we have over here the ones place all the way over here. And then we have the two's place and the four's place and the eights 16 32 64 128. And we come bearing a microphone if each of you want to say a quick hello. your name, maybe your dorm or house, and something besides computer science that you're studying or want to. >> Hi, I'm Oh, that's loud. Okay. I'm Allison. I'm a freshman in Matthews and um I like climbing and I'm thinking of CS and econ. >> Number two. >> Hi, I'm Lily. I'm in Herbut this year and I'm thinking of doing CS in government. >> Nice to meet. >> Hi. Hi, I'm Sean. I'm in candidate hall and I'm thinking of doing astrophysics and CS. >> Welcome. >> Hi, I'm Jordan. I'm doing applied math with a specialization in CS and econ. And um I'm in Wigglesworth and I like going to the gym. >> Okay, [laughter] nice. 16. >> Hi, I'm Shiv. I'm studying Macki and I'm in Canada. >> Nice. >> Hi, I'm Sophia. I'm in the think of doing electrical engineering. >> Welcome. Hi, my name is Marie and I'm in Canada B and I really like CS physics and astrophysics. >> Hi, I'm Alyssa. I'm in Hullworthy. I'm also thinking of studying math or physics and I also like to climb. >> Nice. Welcome to you all. So, on the backs of their sheets of paper, they have a little cheat sheet that's describing what they should do in each of three rounds. We're going to spell out together a threeletter word. You all as the audience have a cheat sheet above you that represents numbers to letters. These folks don't necessarily know what they're spelling. They only know what they individually are spelling. So if your sheet of paper tells you to represent a zero in a given round, just kind of stand there awkwardly, no hands up. But if you're told on your sheet of paper to represent a one, just raise a single hand to make obvious to the audience that you're representing a one and not a zero. And the goal here is to figure out what we are spelling using this system called ASKI. All right, round one, execute. What number is this here? I'm hearing You can just shout it out. What number? >> 66 or B. So, you're spelling B. All right, hands down. Round two. More math. Feel free to shout it out. >> Oh, I heard it. Yeah. 79, which is >> O. Okay, so we have B O. Hands down. Third and final round. Execute number 87. >> Yes. 87. Which is the letter? >> W. Which spells >> bow? If you want to take your bow now. >> Ah, okay. Here we go. You guys can keep those. Okay. Thank. All right. You guys can head back. Thank you to our volunteers here. Very nicely done. We indeed spelled out bow and that's just because we all standardized on representing information in exactly the same way which is why when you type b on your phone or your computer the recipient sees the exact same thing but what's noteworthy in this discussion is that you can't spell a huge number of words like yeah English okay we've got that covered but odds are you're noticing depending on your own background what human languages you read or speak yourself um that a whole bunch of symbols might be missing from your keyboard for instance we have accented characters here in a lot of Asian languages there's so many more glyphs than we could have even fit in that cheat sheet of numbers and letters and so ASI is not the only system that the world uses it was one of the earliest but we've moved on in modern times to a superset of ASI that's generally known as Unicode and Unicode uses so many more bits than ASI that we even have room for all of these little things that we seem to send constantly nowadays these are obviously images that you might send with your phone or your computer but they're technically ally characters. They're technically just patterns of zeros and ones that have similarly been standardized around the world to look a certain way, but they're this is an emoji keyboard in the sense that you're sending characters. You're not sending images per se. The characters are displayed as images obviously, but really these are just like characters in a different font and that font happens to be very colorful and graphical as well. So, Unicode instead of using just seven or eight bits, which if you do the quick mental math, if ASKI only used seven or let's say eight bits, how many possible characters can you represent in ASKI alone? 256. Because if we do that quick mental math, 2 to the eth 256 possibilities, like that's it. That is that's enough for English because you can cram all the uppercase letters, the lowercase letters, the numbers, and a whole bunch of punctuation as well. But it's not enough for certain other punctuation symbols, not to mention many other human languages. And so the Unicode Consortium, its charge in life has been to come up with a digital representation of all human language, past, present, and hopefully future by using not just seven or eight bits, but maybe 16 bits per character, 24 bits, or heck, even 32 bits per character. And per before, if you've got as many as 32 bits available to you, you can represent what, like 4 billion characters in total. And that's just one of the reasons why these emoji have kind of exploded in popularity and availability. There's just so many darn patterns. Like, what else are we going to do with all of these zeros and ones? But more importantly, emoji have been designed to really represent people and places and things and emotions in a way that transcends human language. But even then, they're somewhat open to interpretation. In fact, here's a pattern of I think 32 zeros and ones. I'm guessing no one's going to do the quick mental math here, but this represents what decimal number if we do in fact do out the math with that's being the ones place all the way over to the left. Well, that's the number 4 bill36,991,16. Who knows what that is? It's not a and it's nothing near a uppercase or lowercase, but it is among the most popular emoji that you might send typically on your phone, laptop, or other device. namely this thing here face with tears of joy which odds are you've sent or received recently but interestingly even though many of you might have iPhones and see and send the same image you'll notice that if you see a friend who's got Android or some other device maybe you're using uh Meta's messenger program or Telegram or some other messaging service sometimes these emoji look a little bit different why because what a Unicode has done is they decided there shall exist an emoji known known as excuse me faced with tears of joy then Apple and Google and Microsoft and others they're sort of free to interpret that as they see fit. So what you see on the screen here is a recent version from iOS, Apple's operating system. Google's version of the same looks a little something like this. And on Telegram, if you have animations enabled, the same idea faced with tears of joy is actually animated. But it's the same pattern of zeros and ones in each case. But again, they each essentially have different graphical fonts to present to you what each of those images actually is. All right. So, those are each, excuse me, [clears throat] images. So, those are each images. How is the computer representing them though? At the end of the day, we've represented numbers, we've represented letters, but how about these things here, colors? So, how do we represent red or green or blue, not to mention every other color in between? At the end of the day, we only have one canvas at our disposal. Yeah, so integers is the exact same answer as before. We just need to agree on what number do we use for red, what do we use for green, what do we use from blue, and we can come up with some standardized pattern for this. In fact, one of the most common techniques for doing this and the common one of the most common ways to do this in the real world is to use a combination of three colors together. Some amount of red, some amount of green, and some amount of blue, and mix them together to get most any color of the rainbow that you might want. This is sort of a a picture of something I grew up with back in the day where in like middle school when we'd watch movies or some kind of show in like in in class, we would kind of uh the projector screen would be over here. This is a old school projector with three different lenses, one of which projects some amount of green, some amount of red, some amount of blue. And so long as the lenses are correctly oriented to all point at the same circle or like rectangular region on the screen, you would see any number [clears throat] of colors coming to life in the old school video. I still remember all these years later, we would kind of sit and lean up against it because it was super warm and you could hear it easy way to fall asleep back in grade school. But we use the same fundamental color system nowadays as well, including in modern programs like Photoshop. So let's abstract that away. focus on just three colors, some amount of red, green, and blue. And let's suppose for the sake of discussion that we want to mix together like a medium amount of red, a medium amount of green, and just a little bit of blue. For instance, let's suppose that we'll use 72 amount of red, 72 amount 73 amount of green or or 33 amount of blue, RGB. Now, why these numbers? Well, in the context of ASI or Unicode, which is just a supererset thereof, what does this spell? >> Hi. But again, if you were instead to open a file containing these three numbers or really these three bytes of bits in Photoshop, you would hope that they're going to be interpreted not as letters on the screen, but as some m uh the the color of a dot on the screen instead. So it turns out that in typically when you have a three of these numbers together each of them is using a single bite. So eight bits. So you can have zero red or 255 red. Zero green or 255 green or 0 to 255 of blue. So zero is none, 255 is the max. So if we mix these together, imagine that just like that projector consolidating these three colors into one central point. Anyone want to guess what you're going to get if you mix some red, some green, some blue in those amounts in way back? >> Yeah, you're going to get a dark shade of yellow. I've brightened it up a little bit for the projector here, but you're going to get roughly this shade of yellow. And we could play with these numbers all day long and get similar results if we want to represent different colors as well. And indeed, whether it's Photoshop or some other program, you can actually combine these amounts in all sorts of ratios to get different colors. So if you had 0 0 0, so no red, no green, no blue, take a guess as to what color that's going to be in the computer, >> so it's going to be black, like the absence of all three of those colors. But if you mix the maximal amount of each of those 255, red and green and blue, that's going to give you white. Now, if any of you have made web pages before or use programs like Photoshop, you might have seen numbers like 00 or FF. Long story short, that's just another base system for representing numbers between 0ero and 255 as well. But we'll come back to that mid-semester when we make some of our own filters uh in sort of an Instagram-like way, manipulating images of our own. So, where are these colors coming from or where can we actually see them? Well, here's just a picture of that same emoji face with tears of joy. If I kind of zoom in on that and maybe zoom in again, you can start to see if you blow it up enough or if you put your eyes close enough to the device, sometimes you can actually see individual dots or squares. These are generally known as pixels. And they're just the individual dots that collectively compose an image. Which is to say that if each of these dots, which is part of the image, is going to be a distinct color. Like this one's yellow, this one's brown, and then there's a bunch in between. Well, you're using some number of bits to represent each of those pixels colors. So, if you imagine using the RGB system, that's 8 + 8 + 8 bit. So, that's 24 bits or three bytes just to keep track of the color of each and every one of these dots. So now, if you think about having downloaded a GIF at some point, a ping, PNG file, um a JPEG or any other file format, it's usually measured in what file size? like megabytes typically that means millions of bytes. Why? Because if it's a pretty big photograph or pretty big image, each of those dots takes up at least three bytes it would seem. And if you do out the math, if you got thousands of dots, each of which uses three bytes, you're going to quickly get to megabytes, if not even larger for things like say videos. But again, it's just patterns of zeros and ones. And so long as the programmer knows what they're doing and tells the computer how to interpret those zeros and ones. And equivalently, so long as the software knows, look at these zeros and ones and interpret them as numbers or letters or colors, we should see what we intended to represent. All right, so that's num that's uh colors and images. What about how many of you kind of played with these little flip books as a kid where they've got like a hundred different little pictures and you flip through them really quickly and you see what looks like animation in book form. Well, this is essentially a video. So therefore, what is a video or how can you think of what a video is? It's just a whole bunch of like images flying across the screen either on paper or digitally nowadays on your phone or your laptop. And that's kind of nice because we're sort of composing more interesting media now based on these lower level building blocks. And this is going to be thematic. We literally started with zeros and ones. We worked our way up to letters. We then worked our way up to sort of images and uh colors and thus images. Now we're up at this level of hierarchy in terms of video because what's a video? It's like 30 images per second flying across the screen or maybe slightly fewer than that. That collectively tricks our mind into thinking we are seeing motion pictures. And that's the old school term for movies, but it literally is what it was. motion pictures was this film was showing you 30 pictures per second and it looks like motion even though you're just looking at images much like this flip book very quickly one after the other. What about music? Well, how could you go about representing musical notes if again your only ingredients are zeros and ones? Even if you're not a musician, how do you represent music like that on the screen here? Yeah. Okay. So, the frequency like the tone that you're actually hearing from the device. What else might weigh in beside besides the frequency of the note? Yeah. >> So the speed of the note or maybe the duration like if you think about a physical piano like how long you're holding the key down for or not. What else? So the amplitude maybe how loud like how hard did you hit the keyboard to generate that sound. So let me propose at the risk of simplifying we could represent each of these notes using three numbers. maybe 0 to 255 or some other range that represents the frequency or the pitch of the note, the duration, and the loudness. And so long as the person receiving a file containing all of those zeros and ones knows how to interpret them three at a time, I bet you could share uh a musical file with someone else that they could hear in exactly the same way that you yourself intended. Let me pause here to see if there's any questions now because we've already built our way up from zeros and ones now to video and sound. >> Yeah, in front. >> How does the computer know differentiate between what the letter like 65 would be and then what the number 65? >> So, how does the computer distinguish between the letter 65 and the number 65? It's context dependent. So put simply and we'll see this as early as next week the programmer tells the computer how to display the information either as a number or a letter or equivalently once programmed the software knows that when it opens a GIF file or JPEG or something else to interpret those zeros and ones as colors instead of as like docx for a Microsoft Word file or the like. Other questions on any of these representations? Yeah. In front. Can we >> go over like the base 10 base 2 thing like really briefly? >> Sure. So, can we go over base 10 and base two? So, base 10 is like literally the numbers you and I use every day. It's base 10 in the sense that you have 10 digits at your disposal. 0 through 9. And any numbers you want to represent in the real world must be composed using 0 through 9. The binary system or base 2 is fundamentally the same. It's just the computer doesn't have access to two through 9. It only has access to zero and one. But much like the light bulbs I was displaying here, you can simply ascribe different weights to each of the digits. So that instead of it being as much as the ones place, the 10's place, and the hundred's place, if we more modestly say the ones place, the two's place, the four's place, we can use the same system. In binary, you might need to use more digits to count as high because in 255, you can just write 255. That's three digits in decimal. But in binary, we've seen you need to use eight such digits, which is more, but it's still much better than unary, which would have had 255 light bulbs on instead. >> And is binary and like the same thing. >> Is binary and base 2 the same thing? Yes. Just like base 10 and decimal are the same thing as well. And unary and base 1 are the same thing as well. All right. So let me just stipulate that even though we sort of took this tour quickly at the end of the day computers only have zeros and ones at their disposal. So again the answer to any question as to how can we represent X is going to somehow involve permuting those zeros and ones into patterns or equivalently into the numbers that they represent. But if we now have a way to represent all inputs in the world be it letters, numbers, images, videos, anything else and get output from some problem-solving process like how do we actually solve problems? Well, the secret sauce in the middle here is another term that you've probably heard in the real world nowadays, which is that of algorithm. Stepbystep instructions for solving some problem. So, this ultimately is what computer science really is about too, is not just representing information, but somehow processing it, doing something interesting with it to actually solve the problem that you've been provided as input so you can output the correct answer. Now, there's all sorts of algorithms implemented in our phones and in our Macs and PCs, and that's all software is. It's an implementation in code, be it C++ or Java or anything else. Other languages exist too in code that the computer understands, but it's still just step-by-step instructions. And among the things we'll learn in CS50 is how to express yourself in different ways to solve problems, not only in different languages, but using different methodologies as well. Because as we'll see, among the reasons we introduce these several languages is you don't just learn more and more languages that allow you to solve the same problems. Different languages will allow you to solve different problems and even save you time by being better tools for the job. So here for instance on uh an iPhone is maybe a bunch of contacts which is presumably familiar where we might have a whole bunch of friends and family and whatnot alphabetized by first name or last name and suppose we want to find one such person like John Harvard whose number here might be plus1 949-4682750. Feel free to call or text him sometime. Um this is the goal of this problem. If we have our contacts app and I start typing in John's name by first name or last name, the autocomplete nowadays kicks in and it somehow filters the list down from my 10 friends or 100 friends or a thousand friends into just the single directory entry that matches. So here too, back in the days of RG&B um projector, we had uh phone books like this here too. Um I'm pleased to say thanks to our friend Alexis, this is the largest phone book that we've used for this demonstration. Uh, this is an old school phone book that's essentially the same thing as our contacts app or address book nowadays whereby I've got a whole bunch of names and numbers alphabetically sorted by first name or last name, whatever, and corresponding to each of those as a number. So, back in the day and frankly even nowadays in your phones, how do you go about finding someone in a phone book or your contacts app? Well, you could very naively just start at the beginning and look down and just turn one page at a time looking for John Harvard in this case. Now, so long as I'm paying attention, this step-by-step process will get me to John Harvard. Like, this is a correct algorithm, even though you might kind of object to how I'm doing this. Why? Like, what's bad about this algorithm? >> It's just slow. I mean, this is crazy slow. If there's like a thousand pages in this phone book, which looks like there are, like this could take me as many as a thousand pages, or maybe he's roughly in the middle, like 500 pages. Like, that's crazy. That's really rather slow, especially if I'm going to do this again and again. Well, what if I do it a little smarter? Grade school, I sort of learned how to count two at a time. So, 2 4 6 8 10 12 14 16 18. Again, if I'm paying attention, I'll get there twice as fast because I'm counting two at a time. But is that algorithm step by step correct? And I'm seeing no, but why? >> I might skip over John Harvard. So, just by bad luck and kind of with 50/50 probability, he's going to be sandwiched between two of the pages. Now, I don't have to abort this algorithm alto together. I could just as soon as I get past the J section if we're doing it by first name. I could just double back one page and just make sure that I haven't missed him. So, it's recoverable. And this algorithm therefore is sort of twice as fast plus one extra step maybe to double back. But that's arguably otherwise a bug or a mistake in the algorithm if I don't fix it intelligently. But what did we do back in the day? And what does your iPhone or Android phone do? What they typically do is they go roughly to the middle, look physically or virtually down. They see, "Oh, I'm in the M section." And so, which side is John Harbor to? To the left or to the right? So, he's to the left. So, I could literally now Jesus Christ. We talked about this before class that this might be more Oh my god. There we go. We can tear the problem in half. Thank you. [applause] It's been a while. We can tear the problem in half. We know that John Harvard is to the left. So, I can throw half of the problem away if uh dramatically such that I'm now gone from a thousandpage problem to 500 pages instead. What now can I do? I can go roughly to the middle here and maybe I'm in the E section. So, I went a little too far back to the left, but I kept it simple and I just divided so that I can conquer this problem, if you will. And if I'm in the E section now, is John Harvard to the left or to the right? To the right. So I can again Jesus Christ. Tear the problem in half. And now, thank you. So now John Harvard again is going to be in this half. I can throw this half away. So now I've gone from a,000 to 500 to 250. And I can repeat, repeat, repeat down to 125. Half of that, half of that, half of that until I'm left with finally just a single page. And John Harvard is hopefully now on this page such that I can call him or not at all at which point this is all sort of for not. But what's powerful about each of those algorithms is that the sort of good better and best like they all get the job done conditional on the second one having that little fix just to make sure I don't miss John Harbor between two pages but they're fundamentally different in their efficiency and the quality of their design. And this is really representative of one of the emphases of a class like this. It's not just about writing correct code or getting the job done, but doing it well and doing it quickly. Using the least amount of CPU or computing resources, using the minimal amount of RAM, using the fewest number of people, using the least amount of money, whatever your constrained resource is, solving a problem better. So that first algorithm step-by-step instructions was all about doing something like this whereby the first algorithm if we plot things on a grid like this we have on the x-axis a representation of the size of the problem. So this would mean small problem like zero pages. This would mean big problem like a thousand pages. And on the y or vertical axis we have some measurement of time. So this is the number of seconds or the number of page turns whatever your metric actually is. So this would be uh not much time at all, so fast. This would be a lot of time, so slow. So what's the relationship if we just roughly draw these three algorithms? Well, the first one is technically a straight line. And we'll describe that as n. The slope is n because if you think of n as a number for the number of pages, well, there's a one toone relationship in the first algorithm as to how many times I have to turn the page based on how many pages there actually is. And you can think about this in the extreme. If I was looking for someone whose name started with Z, I might have to go through like a thousand darn pages to get to that person whose name started with Z, unless again I do something hackish and just kind of cheat and go to the end. If we execute these algorithms again and again the same way, that's going to be pretty slow. But the second algorithm was pretty much twice as fast plus that one extra step potentially. But it's still a straight line because if there's a thousand pages and I'm dividing the problem and I'm doing two pages at a time, well that's like n divided by two steps plus one give or take. But it's still a straight line because but it's still better. Notice if this is the size of the problem, a thousand pages for instance, we'll notice that the first algorithm took literally twice as much time as the second algorithm. So we're doing better already. But the third algorithm fundamentally is going to look something like this. And if you remember your logarithm so to speak, sort of the opposite of an exponential, this curve is so much lower and flatter, if you will, than either of these two mathematically. More on this another time. The slope is going to be like log base 2 of n or just logarithmic in nature. But what it means is that it's growing very very very slowly. It's still going up. It's never going to flatline and go perfectly horizontal, but it goes up very slowly. Why? Well, if you think about two towns nearby, like Cambridge on this side of the river and the town of Alustin on the other, suppose that they still have phone books like this one, and they merge their phone books for whatever reason. So, overnight, we go from a thousandpage phone book to a 2,000page phone book. The first algorithm is going to take literally twice as long as will the second one because we're only going through it one or two pages at a time. But if the phone book size doubles from this year, for instance, to next year, you can kind of in your mind's eye think about the green line. It's not going to go up that much higher. Why? Well, practically speaking, even if the phone book becomes 2,000 pages long. Well, how many more times do you have to tear or divide that problem in half? >> Just one. Because you're taking a,000 page bite out of it, or a 500 than a 250. you're taking much bigger bites out of it than just one or two at a time. And so what computer science and what algorithms and about good design is about is figuring out what is the logic via which you can solve problems not only correctly but efficiently as well. And that then gives us these things called algorithms. And when it comes time to code, which we're about to do too, code is just an implementation and a language the computer understands of an algorithm. Now this assumes that we've come up with some digital way that is to say zero in onebased way to represent names and numbers. But honestly we already did that. We came up with a asky and then unicode to represent the names. Representing numbers is even easier than that. That's really where we started. So code is just about taking as input some standardized representation of names and numbers and spitting out answers. And that's truly what iOS and Android are doing. When you start doing autocomplete, they could be searching from the top to the bottom, which is fine if you've only got a few friends and family in the phone. But if you've got a thousand or if you've got 10,000 or if it's not a phone book anymore, it's some database with lots and lots of data. Well, it stands to reason that it'd be nice maybe if the computer kept it all alphabetized just like that book and jumped to the middle, then the middle of the middle, then the middle of the middle of the middle, and so forth. Why? because the speed is going to be much much faster, logarithmic in nature and not linear so to speak in nature. But we'll revisit those topics as well. But for now, before we get into actual code, let's talk for a moment about pseudo code. So pseudo code is not one formal thing. Every human will come up with their own way of representing pseudo code. It's an English-like or human-like formulation of step-by-step instructions just using tur correct English or whatever human language. So, for instance, if I want to translate what I did somewhat intuitively with that phone book by just dividing in half, dividing in half into step-by-step instructions, I could hand you or now it is like a robot or something like that. Well, step one was essentially to pick up the phone book, which I did. Step two was I open to the middle of the phone book in the third and final algorithm. Step three was look at the page as I did. Step four got a little more interesting. Even though I didn't verbalize this, presumably I was asking myself a question. If the person I'm looking for, John Harbert, is on the page, then I would have called him right then. But if he weren't on the page, if he instead were earlier in the book, as did happen, well then I'm going to go to the left, so to speak, but more methodically, I'm going to open to the middle of the left half of the book. Then I'm going to go back to line three. That's interesting. We'll come back to that in a moment. But else if the person is later in the book, well, I'm going to open to the middle of the right half of the book and then go back to line three. Now, let's pause here. Why do I keep going back to line three? This would seem to get me doing the same thing forever endlessly. But not quite. Why? >> As soon as you hit the one the on. >> Yeah. So because I am dividing the problem in half, for instance, on line six or line nine implicitly just based on how I've written this, the problem's getting smaller and smaller and smaller. So it's fine if I keep doing the same logic again and again because if the problem's getting smaller, eventually it's going to bottom out and I'm going to have just one person on that page that I want to call and so the algorithm is done. But there is a perverse corner case, if you will, and this is where it's ever more important to be precise when writing code and anticipate what could go wrong. I should probably ask one more question in this code, not just these three. What might that question be? Yeah. >> John Harvard is in the book. >> Yeah. So, if John Harvard is not in the book, there's this corner case where what if I'm just wasting my time entirely and I get to the end of the phone book and John Harvard's not there. What should the computer do? Well, as an aside, if you've ever been using your Mac or PC or phone and the thing just freezes or like the stupid little beach ball starts spinning or something like that and you're like, what is going on? Some human at Google or Microsoft or Apple or the like made a mistake. They forgot for instance that fourth uncommon but possible situation wherein if they don't tell the computer how to handle it, the computer's effectively going to freak out and do something undefined like just hang or reboot or do something else. So we do want to add this else quit altogether. So you have welldefined behavior and truly think that the next time your computer or phone spontaneously reboots or dies or does something wrong, it's probably not your fault per se. It's some other human elsewhere did not write correct code. They didn't anticipate cases like these. But now let's use some terminology here. There's some salient ideas that we're going to see in Scratch and C and Python and these other languages I alluded to earlier. Everything I've just highlighted here, henceforth, we're going to think of as functions. Functions are verbs or actions that really get some small piece of work done for you. Functions are verbs or actions. Here though, highlighted is the beginning of what we'll call conditionals. Conditional is like a fork in the road. Do I go this way? Do I go this way? Or some other way altogether. How do you decide what road to go down? We're going to call these questions you ask yourself boolean expressions. Named after a mathematician Bull. And a boolean expression is just a question that has a yes or no answer or a true or false answer or a one or zero answer just it's a binary state yes or no typically. Otherwise we have this go back to go back to which is what we're generally going to call a loop which somehow induces cyclical behavior again and again. And those functions and those conditionals, boolean expressions and loops and a few other concepts are pretty much what will underly all of the code that we write whether it is in scratch C or something else altogether. But we need to get to that point and in fact let's go and infer what this program here does. At the end of the day, computers only understand zeros and ones. So I claim here is a program of zeros and ones. What does it do? Anyone want to guess? I mean, we could spend all day converting all of these zeros and ones to numbers, but they're not going to be numbers if it's code. What do you think? >> That's amazing. It does in fact print hello world. All right. So, no one except like maybe you and me and a few others in the room should know, and that was probably guess admittedly or advancing on the slide. But why is that? Well, it turns out that not only do computers standardize information, data like numbers and letters and colors and other things, they also standardize instructions. And so, if you've heard of companies like Intel or AMD or Nvidia or others, among the things they do is they decide as a company what pattern of zeros and ones shall represent what functionality. And it's very low-level functionality. those companies and others decide that some pattern of zeros and ones means add two numbers together or subtract or multiply. Another pattern might mean load information from the computer's hard drive into memory. Another might mean store it somewhere else. Another might mean print something out to the screen. So nested somewhere in here and admittedly I have no idea which pattern off because it's not interesting enough to go figure it out at this level says print. And somewhere in there, like this gentleman proposed, I bet we could find the representation of H, which was 72 and E and L and L and O and everything that composes hello world. Because, as it turns out in programming circles, the very first program that students typically write is that of hello world. Now, this one here is written in a much more intelligible way. Even if you're not a programmer, odds are if I asked you, what does this program do? you would have said, "Oh, hello world." Even though there's a lot of clutter here, like no idea what this is until next week. Int main void. That looks cryptic. There's these weird curly braces, which we rarely use in the real world, but at least I understand a few words like hello in world. And this is kind of familiar. Print f, but it's not print, but it's probably the same thing. So, here too is an example of this hierarchy. Back in the day, in the earliest days of computers, humans were writing code by representing zeros and ones. If you've ever heard your parents talk about punch cards or the like, you're effectively representing patterns that tell the computer what to do or what to represent, like literally holes in paper. Well, pretty quickly early on this got really tedious, only writing code at such a low level. So, someone decided, you know what, I'm going to put in the effort. I'm going to figure out what patterns of zeros and ones I can put together so as to be able to convert something more user friendly to those zeros and ones. And as a teaser for next week, that person invented the first compiler. A compiler is just a program that translates one language to another. And more modernly, this is a language called C, which we'll spend a few weeks on together because it's so fundamental to how the computer works. Even this is going to get tedious by like week six of the class. And this is going to get stupid. This is going to get annoying. This is going to get cryptic. We're just going to write print hello on the screen in order to use a different language called Python. Why? because someone wrote in C a program that can convert Python, this is a white lie, to C which can then be converted to zeros and ones and so forth. So in computing there's this principle of abstraction where we start with the basics and thank god we can all trust that someone else solved these really hard problems or way uh long ago. Then they wrote programs to make it easier. We wrote programs to make it easier. You can now write code like I did with the chatbot to make things even easier. Why? because OpenAI and other companies have abstracted away a lot of the lower level implementation details. And that's where I think this stuff gets really exciting. We can stand on the shoulders of others so long as we know how to use and assemble these kinds of building blocks. And speaking of building blocks, let's start here. Now, odds are some of you might have started here in like grade school playing with Scratch. And it's great for like after school programs, learning how to program. And you probably used it this language to make games and graphics and just maybe playful art or the like. But in Scratch, which is a graphical programming language designed about 20 years ago from our friends down the road at MIT's Media Lab, it represents pretty much everything we're going to be doing fundamentally over the next several weeks in more modern languages like C and Python, more textual languages, if you will. I bet I could ask the group here, what does this program do when you click a green flag? Well, it says hello world on the screen. Because with Scratch, you have the ability to express yourself with functions and loops and conditionals and all of this, but by using drag and drop puzzle pieces. So, what we're about to do is this. We're going to go on my screen to scratch.mmit.edu. It's a browserbased programming environment, and we're only going to spend one week, really a few days in CS50 on this language. But the overarching goal is to one make sure everyone's comfortable applying some of these building blocks and actually developing something that's interesting and visual and audio as well, but to also give us some visuals that we can rely on and fall back on when all of those curly braces and parentheses and sort of stupid syntax comes back that's necessary in many languages but can very quickly become a distraction early on from the interesting and useful ideas. So what we're about to see is this in a browser. This is the Scratch programming environment and there's a few different parts of this world. This is the blocks pallet so to speak. That is to say, there's a bunch of puzzle pieces or building blocks that represent functions and conditionals and v and uh loops and other such constructs. There's going to be the programming area here where you can actually write your code by dragging and dropping these puzzle pieces. There's a whole world of sprites here. By default, Scratch is uh and is a cat by design, but you can make Scratch look like a dog, a bird, a garbage can, or anything else as we'll soon see. And then this is the world in which Scratch itself lives. So Scratch can go up, down, left, right, and generally be animated within that world. For the curious, kind of like high school geometry class, there's sort of this XY plane here. So 0 0 would be in the middle. 0 180 is here. 0 comma 180 is here. Uh -240 is here. and positive 240 0. Generally, you don't need to worry about the numbers, but they exist. So that when you say up or down, you can actually tell the program go up one pixel or 10 pixels or 100 pixels so that you have some definition of what this world actually is. All right, so let's actually put this to the test. Let me go ahead here and flip over to in just a moment the actual Scratch website whereby I'm going to have on my screen in just a moment that same user interface once I've logged in that via which I can actually write some code of my own. Let me go ahead and zoom in on the screen a little bit here and let's make the simplest of these programs first. Maybe a program that simply says hello world. Now at a glance it's kind of overwhelming how many puzzle pieces there are. And honestly, even over 20 years, I've never used them all. And MIT occasionally adds to it. But the point is that they're colorcoded to resemble the type of functionality that they offer. And also, it's meant to be the sort of thing where you can just kind of scroll through and get a visual sense of like what you could do and then figure out how you might assemble these puzzle pieces together. So, I'm going to go under this yellow or orangish category here to begin with. So, there exists in the world of Scratch not quite the same jargon that I'm using now. functions and conditionals and loops. That's more of the programmer's way. This is more of the child-friendly way, but it's really the same idea. Under events, you have puzzle pieces that represent things that can happen while the world is running. So, for instance, the first one here is sort of the canonical when the green flag is clicked. Why is that relevant? Well, in the two-dimensional world that Scratch lives in, there's a stop sign, which means stop, and there's a green flag, which means go. So, I can therefore drag one of these puzzle pieces over here so that when I click that green flag, the cat will in fact do something for me. Doesn't really matter where I drop it, so long as it's somewhere in the middle here. I'm going to go ahead and let go. Now, I want the look of the cat to change. I want to see like a cartoon speech bubble come out for now. So, I'm going to go under looks here. And there's a bunch of different ways to say things and think things. I'm going to keep it simple and just drag this one here. And now notice when I get close enough to that first puzzle piece, they're sort of magnetic and they want to snap together. So I can just let go and boom, because they're a similar shape, they will lock together automatically. And notice too, if I zoom in here, the white oval, which by default says hello, is actually editable by me because it turns out that some functions can take arguments or more generally inputs that influence their behavior. So, if I kind of click or double click on this, I can change it to the more canonical hello world or hello David or hello whatever I want the message to be. I'm going to go ahead and zoom out. And now over here at top right, notice that I can very simply click the green flag. And I'll have written my first program in Scratch. I clicked the green flag, it said go. And now notice it's sort of stuck on that because I never said stop saying go. But that's where I can click the red stop sign and sort of get the cat back to where I want it. So think about for just a moment what it is we just did. So at the one hand we have a very obvious puzzle piece that says say and it said something but it really is a function and that function does take an input represented by the white oval here otherwise known as an argument or a parameter. But what this really is is just an input to the function. And so we can map even this simple simple scratch program onto our model of problem solving before with an addition of what we'll call moving forward a side effect. A side effect in a computer program is often something that happens visually on the screen or maybe audibly out of a speaker. It's something that just kind of happens as a result of you using a function like a speech bubble appearing on the screen. So here more generally is what we claimed it represents the solving of a problem. And let's just consider what the input is. The input to this problem say something on the screen is this white oval here that I typed in. Hello world. The algorithm, the step-by-step instructions are not something really I wrote like our friends at MIT implemented that purple say block. So someone there knows how to get the cat to say something out of its uh comical mouth. So the algorithm implemented in code is really equivalent to the say function. So a function is just a piece of functionality implemented in code which in turn implements an algorithm. So algorithm is sort of the concept and the function is actually the incarnation of it in code. What's the output? Well, hopefully it's this side effect seeing the speech bubble come out of the cat's mouth like this. All right, so that's one such program, but it's always going to play and look the same. What if I actually want to prompt the human for their actual name? Well, let me go back to the puzzle pieces here. Let me go ahead and throw this whole thing away. Okay. And if you want to delete blocks, you can either rightclick or control-click and choose from a menu. Or you can just drag them there and sort of let go and they'll disappear. I'm going to go back in and get another uh another event block, even though I could have reused that same one. I'm going to go ahead and go under sensing now. And if I zoom in over here, you'll see a whole bunch of things like I can sense distance and colors. But more pragmatically, I can use this function in blue, ask something, and then wait for the answer. And what's different about this puzzle piece is that it too is yes a function. It too takes an argument, but instead of having an immediate side effect like displaying something on the screen, it's essentially inside of the computer going to hand me back the response. It's going to return a value, so to speak. And a return value is something that the code can see, but the human can't. A side effect is something the human sees, but a return value is something only the computer sees. It's like the computer is handing me back the user's input. So, how does this work? We'll notice, and this is a bit strange. This isn't usually how variables work, but Scratch 2 supports variables, and that was a word I used quickly at the very start when we were making the chatbot. A variable like in math, X, Y, or Z, just store some value, but it doesn't have to store a number. In code, it can store like a human name. So, what's going to happen when I use this puzzle piece is that once the human types in their name and hits enter, MIT, or really Scratch is going to store the answer, the so-called return value in a variable that's designed to be called answer. But, as we'll see, you can make your own variables down the line if you want and call them anything you want. But, let me go ahead and zoom out. Let me drag this over here. I'm going to use the default question, what's your name? But I could certainly change the text there. And let me go under looks again. Let me go ahead and grab the say block and let me go ahead and say just for consistency like hello, okay? And now let me go under maybe sensing I want to say how do I want to say this answer. Well, notice this. The shapes are important. This too is an oval even though it's not white but that's just because it's not editable. It's going to be handed to me by the ask function. Let me zoom out and grab a second say block like this. And notice it will magnetically clip together. I don't want to say hello again. So, I could delete that. But now it's still the same shape even though it's a little smaller. Let me go back to sensing. And notice what can happen here. When you have values like words inside of a so-called variable, you can use those instead of manual input at your keyboard. And notice it too wants to magnetically snap into place. It'll grow to fit that variable because the shape is the same. And now let's do this. Let me click the green flag at right. I'm seeing quote unquote what's your name? I'm getting a text box this time, like on a web page for instance. Let me type in my name and watch closely what comes out of the cat's mouth as soon as I click the check mark or hit enter. Huh. Okay, I got my name right, but let me do it once more. Let me stop and start davvid. Enter. No, it didn't work. Let me try one other. Maybe it's my name. Let's try Kelly. Enter. What's missing? Obviously, the the hello. There's a bug, a mistake in this program. But is there like what explains this? Even if you've never programmed before, intuitively, what could explain why I'm not seeing hello? >> Exactly. It's on two different lines. So, it's doing one after the other. So, it is happening. It's just you and I is the slowest things in the room are just not seeing it in time because it's happening so darn fast. Because my computer is so, you know, so new and so fast, it's happening, but way too quickly. So, how can we solve this? So we can solve this in a few different ways. And this is where in Scratch at least for problems at zero when wherein you'll have an opportunity to play around with this. I can scroll around here and okay under control I see something like weight. So I can just kind of slow things down. And now notice too if you hover over the middle of two blocks if it's the right shape it'll just snap into the middle too. Or you can just so you know kind of drag things away to magnetically separate them. But this might solve this. So let me hit stop and then start davvid. Enter. Hello, David. All right, that was a little Let's do like maybe two seconds to see it again. Green flag dab ID. Enter. Hello, David. All right, it's working better. It's sort of more correct because I'm seeing the hello and the David, but kind of stupid, right, to see one and then the other. Wouldn't it be nice to say it all in one breath, so to speak? Well, here's where we can maybe compose some ideas. So, let me get rid of this weight and the additional block. Let's confine ourselves to just one say block. But let me go down to operations where we haven't been before. And this is interesting. There's this bigger oval here that says join two things like apple and banana. And those are just random placeholder words that you can override with anything you want. But they're both ovals and white, which means I can edit them. So let me go ahead and do this. Let me drag this on top of the say block. And this is just going to therefore uh override the hello I put there. Now I don't want to say apple or banana, but I do want to say hello, and I then want to say my name. Okay, so now I can go back to sensing, go back to answer, drag and drop this here. That'll snap into place. And let me zoom in. Now what I've done is take a function and on top of it I've nested another function, the join function that takes two arguments or inputs and presumably joins them together as per its name. So let's see what this does for us. Let me click stop and start. I'll type in David enter. And it's so close. Now, this is just kind of an aesthetic bug. What have I done wrong here? There's no space. So, it looks a little wrong, but that's an easy fix. I just need to literally go into the hello block after the comma, hit the space bar, so that now when I stop and start again and type in David, now I see something that's closer to the grammar we might typically expect syntactically here. All right. So, let's model this after what we just saw earlier. We've now introduced a so-called return value. And this return value is something we can then use in the way we want. It's not happening immediately like the speech bubble. It's clearly being passed to me in some way that I can use to plug in somewhere else like into that join block. So if we consider the role of these variables playing, let's consider the picture now as follows. If the input now to the first function, the ask block is what's your name? Quote unquote, that's indeed being fed into the ask block. And the result this time is not a speech bubble. It's not some immediate visual side effect. It is the answer itself stored in a so-called variable as represented by this blue oval. Meanwhile, what I want to do is combine that answer with some text I came up with in advance by kind of stacking these things together. Now, visually in Scratch, you're stacking them on top, but it's really that you're passing one into the other into the other because much like math when you have the parenthesis and you're supposed to do what's inside the parenthesis and then work your way out. Same idea here. You want to join hello and answer together. And whatever that output is, that then becomes the input to the say block, which like in math is outside of the join block itself. So pictorially, it might now look like this. There's two inputs to this story. Hello, comma, space, and the answer variable. The puzzle piece in question is join. Its goal in life had better be to give me the full phrase that I want. Hello, David. Let's shift everything over now because that output is about to become the input to the say block which itself will now have the so-called side effect. And so this too is what programming and in turn what computer science is about is composing with the solutions to smaller problems solutions to bigger problems using those component pieces. And that's what each of these puzzle pieces represents is a smaller problem that someone else or maybe even you has already solved. Now, we can kind of spice things up here. If I go back to Scratch's interface, we don't have to use just the puzzle piece here. I can do something like this. Let me go ahead and drag these apart and get rid of the say block down here. Just for fun, there's all these extensions that you can add over the internet to your own Scratch environment. And if I go to like text to speech down here, I can, for instance, do uh a speak block instead of a say block colored here in green. I can now reconnect the join block in here. And if we could raise the volume just a little bit. Let me stop the old version, start the new version, type in my name, and hear what Scratch actually sounds like. >> Hello, David. >> Okay, not very cat-like, but we can kind of waste some time on this by like dragging the set voice to box. And I can put this anywhere I want above the speak block. So, I'm just going to put it here, even though I've already asked a question. Maybe kitten sounds appropriate. Let's try again. Dav >> meow meow. >> Okay. And then let's see uh giant little creepier. Here we go. DAV ID. And lastly, >> hello David. >> All right. Little ransomlike instead. All right. So, that's just some additional puzzle pieces, but really just the same idea, but I like that we've introduced some sound. So, let's do this. Let me go ahead and throw away a lot of those puzzle pieces, leave ourselves with just the when green flag clicked, and play around with some other building blocks that we've seen already thus far. Let me go ahead, for instance, under sound, and let's make the cow actually meow. So, it turns out Scratch being a cat by default comes with some sounds by default like meowing. So, if we go ahead and click the green flag after programming this program, let's hear what he sounds like now. Okay, kind of cute. And if you want it scratched to meow twice, you can just play the game again. And a third time. All right, but that's going to get a little tedious as cute as it is. So, I can solve that. Let's just grab three of the puzzle pieces and just drag them together and let them connect. And now click the green flag. All right. Doesn't it gets less cute quickly, but maybe we can slow it down so that the cat doesn't sound so so hungry. Maybe let me go under uh let's see under control. Let's grab one of those. Wait one second and maybe plop a couple of these in the middle here. That might help things. And now click the green flag. Okay. Still a little hungry, but let's see if we change it to two. And then I change it to two down here in both places. Let's play it again. Okay, cuter maybe, but now I'm venturing into badly programmed territory. This is correct. If my goal is to get the cat to meow three times, pausing in between. Sorry, three times pausing in between. What is bad about this code? Even if you've never programmed before, though. Yeah, in the middle. >> Yeah, I literally had to repeat myself three times. Essentially copy pasting. And frankly, I could have been really lazy and I could rightclick or control-click and I could have chosen duplicate. But generally, when you copy paste code or when you duplicate puzzle pieces, probably doing something wrong. Why? It's solving the problem correctly, but it's not well designed. Even if for only because when I change the number of seconds, now I had to change it in two places. So, I had one initially, then I had to change it to two. And if you just imagine in your mind's eye having not like six puzzle pieces but 60 or 600 or 6,000, you're going to screw up eventually if it's on you to remember to change something here and here and here and here. Like you're going to mess up. It's better to keep things simple and ideally centralized by factoring out common functionality. And clearly playing sound and waiting is something I'm doing at least twice if not a third time here as well. So how can we do this better? Well, remember this thing loops. Maybe we can just do something a little more cycllically. So I tell the computer to do something once, but I tell it how many times to do that al together. So notice here by coincidence under control I have a repeat block which doesn't say loop, but that's certainly the right semantics. Let me go ahead and drag the repeat block in and I'll change the 10 to three just for consistency here. I'm going to go back to sound. I'm going to go ahead and play sound meow until done just as before. And just so it's not meowing too fast under control, I'm going to grab a weight one second and keep it inside the loop. And notice that the loop here is sort of hugging these puzzle pieces by growing to fill however many pieces I actually cram in there. So now if I click play, the effect is going to be the same, but it's arguably not only correct, but also well designed because now if I want to change the weight, change it in one place. If I want to change the total number of times, change it in one place. So I've modularized the code and made it better designed in this case. But now this is silly because even though I want the cat to meow, it feels like any program in which I want this cat to meow, I have to make these same puzzle pieces and connect them together. Wouldn't it be nice to invent the notion of meowing once and then actually have a puzzle piece called meow? So when I want the cat to meow, it will just meow. Well, I can do that, too. Let me scroll down to my blocks here in pink. I'm going to click make a block and I'm going to literally make a new puzzle piece that MIT didn't think of called meow. And I'm going to go ahead and click okay. Now I have in my code area here a define block which literally means define meow as follows. So how am I going to do this? Well, I'm going to propose that meowing just means to play the sound meow until done and then wait 1 second. And notice now I have nothing inside my actual program which begins when I click the green flag. But notice at top left because I made a block called meow, I now have access to one that I can drag and drop. So now I can drag me into this loop. And per my comment about abstracting the lower level implementation details away, I'm going to sort of unnecessarily dramatically just move that out of the way. It still exists. I didn't delete it, but now out of sight, out of mind. Now, if you agree with me that meow means for the cat to make a sound, we've abstracted away what it means mechanically for the cat to say that sound. And so, we now have our own puzzle piece that I can just now use forever because I invented the meow block already. Now, I can do one better than this. It would be nice if I could just tell the meow block how many times I want it to meow because then I don't need to waste time using loops either myself. So, let me do this. Let me zoom out and let me go back to my define block. Let me rightclick or control-click and just edit it. Or I could delete it and start over, but I'll just edit it. And specifically, let me say, you know what, let's add an input, otherwise known as an argument, to this meow block. And we'll call it maybe n for the number of times I want it to meow. And just to be super clear, I'm going to add a label, which has no functional impact, but it just helps me remember what this does. So, I'm going to say meow end time, so that when I see the puzzle piece, I know what the N actually represents. If I now click okay, my puzzle piece looks a little different at top left. Now it has the white oval into which I can type or drag input. Notice down here in the define block, I now see that same input called N. So what I can do now is this. Let me go under control. Glag, drag the repeat block here. And I have to do a little switcheroo. Let me disconnect this. Plug it inside of the repeat block. Reconnect all of this. And I don't want 10. And heck, I don't even want three down here anymore. I can drag this input because it's the right shape. And now declare that meowing n times means to repeat the following n times. Play sound meow until done. Wait one second and keep doing that n total times. If I now zoom out and scroll up, notice that my usage of this puzzle piece has changed such that I don't actually need the repeat block anymore. I can disconnect this. And heck, I can actually rightclick and uh control-click and delete it. just use this under the green flag. Change this to a three. And now I have the essence of this meowing program. The implementation details are out of sight, out of mind. Once they're correct, I don't need to worry about them again. And this is exactly how Scratch itself works. I have no idea how MIT implemented the weight block or the repeat block. Heck, there's a forever block and there's a few others, but I don't need to know or care because they've implemented those building blocks that I can then implement myself. I don't necessarily know how to build a whole chatbot, but on top of OpenAI's API, this web-based service, I can implement my own chatbot because they've done the heavy lift of actually implementing that for me. Well, let's do just a few more examples here. Let's bring the cat all the more to life. Let me throw away the meowing. Let me open up under when green flag clicked. How about that forever block that we just glimpsed? Let me go ahead and now add to the mix what we called earlier conditionals which allow us to ask questions and decide whether or not we should do something. So under this, let me go ahead and under forever say if the following is true. Well, what boolean expression do I want to ask? Well, let's implement how about this program and we'll figure out if it works. Uh under sensing, I'm going to grab this uh very angled puzzle piece called touching mouse pointer. that is the cursor and only if that question has a yes answer do I want to play the sound meow until done. So let me zoom in here and in English what is this going to implement really just describe what this program does less arcanely as the code itself. Yeahouse >> yeah if you move the mouse over the cat it will make noise. So, it's kind of like implementing petting a cat, if you will. So, let me zoom out, click the green flag, and notice nothing's happening yet, but notice my puzzle pieces are highlighted in yellow because it is in fact still running because it's doing something forever. And it's constantly checking if I'm touching the mouse pointer. And if so, it's like I just pet the cat. Now, it stopped until I move the cursor again. Now, it stopped. If I leave it there, it's going to keep meowing because it's going to be stuck in this loop forever. But it's correct in so far as I'm petting the cat. Let me do this though. Let me make a mistake this time. Let me forget about the forever and just do this. And you might think this is correct. Let me click the green flag now. Let me pet the cat. And like nothing's actually working here. Why though logically? Yeah. >> Yeah. The program's so darn fast. It already ran through the sequence. And at the moment in time when I clicked the rear flag, no, I was not touching the mouse pointer. And so it was too late by the time I actually moved the cursor there. But by using the forever block, which I did correctly the first time, this ensures that Scratch is constantly checking the answer to that question. So if and when I do pet the cat, it will actually detect as much. All right, about a few final examples before you're on your way building some of your own first programs with these building blocks. Let me go ahead and open up a program that I wrote in advance in fact about 20 years ago whereby let me pull this up whereby we have in this example a program I wrote called Oscar time and this was the result of our first assignment in this class whereby when MIT was implementing Scratch for the very first time we needed to implement our very own Scratch program as well. I'm going to go ahead and full screen it here. The goal is to drag as much falling trash as you can to Oscar's trash can before his song ends. For which one volunteer would be handy here. Okay. I saw your hand go up quickly in blue. Yeah. Come on up. All right. So, you're playing for a stress ball here if we will. At one at some point, I'm going to talk over what you're actually playing just so that we can point out what it is we're trying to glean from this program. And I'll stipulate this probably took me like 8 12 hours. And as you'll soon see, the song starts to drive you nuts after a while because I was trying to synchronize everything in the game to a childhood song with which you might be familiar. Let me go ahead and say hello if you'd like to introduce yourself. >> Oh, hello. So, I'm Han and uh I'm a first year student. I'm pretty excited for this class. >> All right, welcome. Well, here is Oscar time. If you want to go ahead and take control of the keyboard, all you'll need to do is drag and drop trash that falls from the sky into the trash can. Papa heat. And it's around this point in the game where the novelty starts [music] to wear off because there's like three more minutes of this game where more and more stuff starts to fall from the sky. So as Han, as you continue to play, I'm going to cut over here. You keep playing. Let's consider how I implemented this whereby we'll start at the beginning. The very first thing I did when implementing Oscar time honestly was the easy part. Like I found a lamp post that looked a little something like this and I made the so-called costume for the whole stage. And that was it. The game didn't do anything. You couldn't play anything. You put your green flag, nothing happened. But then I figured out how to turn the scratch cat, otherwise known more generally as a sprite, into a trash can instead. And so the trash can, meanwhile, is clearly animated because I realized that, oh, I can give sprites like the cat different costumes. So, I can make the cat not only look like a trash can, but if I want its lid to go up, well, that's just another costume. And if I want to see Oscar popping out, that's just a third costume. And so, I made my own simplistic animation. And you can kind of see it. It's very jittery step by step by step by creating the illusion of animation by really just having a few different images or costumes on Oscar. Now, I hope you appreciate how much effort went involved into timing each of these pieces of trash with the specific mention of that type of piece of trash in the music. Okay. 20 years later, still clinging. So, you're doing amazing, by the way. How do we get the trash to fall in the first place? Well, at the very beginning of the game, the trash just started falling from some random location. What does it mean for trash to fall from the sky? Oh, big climax here. You got a lot of trash on the ground to pick up. There we go. And your final score is a big round of applause if we could for Han. [applause and cheering] Thank you. Thank you. So just to be clear now, let's decompose this fairly involved program that took me a lot of hours to make into its component parts. So this is just a sprite. And I figured out eventually how to change its costume, change its costume, change its costume to simulate some kind of animation. And I also realized that oh, I don't need to just have one sprite or one cat or trash can. You can create a second sprite, a third sprite, and many more. So I just told the sprite to go to a random location at Y equals 180 and X equals something. I think I restricted X to be in this region, which is why the trash never falls from over here. I just did a little bit of math based on that cartisian plane that we saw a slide of earlier. And then I probably had a loop that told the trash to move a pixel, move a pixel, move a pixel down, down, down, down until it eventually hits the bottom and therefore just stops. So we can actually see this step by step. And this is representative of how even for something like your first problem said in CS50 and with Scratch specifically, you might build some of the same. So, I'm going to go back into uh CS50 Studio for today, which is linked on the courses website, which has a few different versions of this and other programs called Oscar 0ero through Oscar 4, where zero is the simplest. And truly, I meant it when I look inside this program to see my code. Like, this was it. There was no code because all I did was put the sprite on the screen and change it from a cat to a trash can. And I added a costume uh a costume for the stage, so to speak, so that the lamp post would be fixated there. If I then go to the next version of code, version one, so to speak, then I had code that did this. Now, notice there's a few things going on here. At bottom left, you'll see of course the trash can and then at top right the trash. Here are the corresponding sprites down here. So, when Oscar is clicked on here, the trash can, you see the code I wrote, the puzzle pieces I dragged for Oscar. And in a moment, when we click on trash, you'll see the code I wrote or the puzzle pieces I wrote dragged and dropped for the trash piece specifically. So what does Oscar do? Well, I first switch his costume to Oscar 1, which I assume is this the closed trash can. Then forever Oscar does the following. If Oscar's touching the mouse pointer, then change the costume to Oscar 2. Otherwise, that is if not touching the mouse pointer, change the costume to Oscar 1. Well, what's the implication? Anytime I move the cursor over the trash can, the lid just pops up, which was exactly the animation I wanted to achieve. Meanwhile, if we do this and click the green flag, you can see that in action, even for this simple version. If I move the cursor over Oscar, we have the beginnings of a game, even though there's no score, there's no music or anything else, but I've solved one of my problems. Meanwhile, if I click on the trash piece here, and then you'll see no code has been written for it yet. So, we move on to Oscar version two and see inside it. In Oscar version two, when I click on trash, ah, now there's some juicy stuff happening here. And in fact, this trash sprite has two programs or scripts associated with it. And that's fine. Each of them starts with when green flag clicked, which means the piece of trash will do two things at once essentially in parallel. The first thing it will do is we'll set drag mode to dragable. And that's just a scratch thing that lets you actually move the sprites by clicking on them, making them dragable. Then it goes to a random X location between 0 and 240. So yeah, that must be what I did from the middle all the way to the right. And I set y always to 180, which is why the trash always comes from the sky from the very top. Then I said forever change your y by negative one. And here's where it's useful to know what 180 is, 240 is, and so forth. Because if I want the trash to go down, so to speak, that's changing its Y by a pixel by a pixel by a pixel. And thankfully MIT implemented it such that if the trash tries to go off the screen, it will just stop automatically, even if it's inside of a forever block, lest you lose control over the sprites altogether. But in parallel, what's happening is this. Also, when the green flag is clicked, uh the trash piece is doing this too forever. If touching Oscar, what's it doing in blue here? Sort of teleporting away. Now, to your eye, hopefully it looks like it's going into the trash can. But what does that mean to go into the trash can? Well, I just put it back into the sky as though a new piece of trash is falling. So even though you saw one piece of trash, two, three, four, and so forth, it's the same sprite just acting that out again and again. So here, if I click play on this program, you'll see that it starts falling one pixel at a time. Because it's draggable, I can sort of pull it away and move it over to the trash can like that. And as soon as I do, it seems to go in, but really it just teleported to a different X location. Still at Y= 180. Again, it's not much of a game yet. There's no score. There's no music or anything, but let's go to Oscar 3 now. And in Oscar 3, if we scroll over to the trash, even more is happening here. In so far as I realized, you know what? There was kind of a inefficiency before. Previously, I had these two programs or scripts synonym whereby they both went to the top by going to 0 to 240 for X and then 180 for Y. And if you noticed, I used that here and I used that down here in both programs. Now that too is kind of stupid because I literally copied and pasted the same code. So if I ever want to change that design, I have to change it in two places and I already proposed that we frown upon that. So what did I do in this version? I just created my own block and I decided to call my own function go to top. What does it mean to go to the top? Pick a random x between those values and fixate on y= 180 initially. Now in both of those programs which are otherwise identical, I just say what I mean. Go to top. Go to top. And if I really wanted to, I could drag this out of the way and never think about it again because now that functionality exists. So correct, but arguably better designed. I've now factored out commonality so as to use and reuse my code as well. So let's go up to Oscar version 4 now. And in Oscar time version 4, the trash can does a little something more whereby what have I added to this mix even though we haven't dragged this puzzle piece together before? Yeah. What's new? >> Score. >> Yeah. So, it turns out on the left here, there's a variables category, which is goes beyond the answer variable that we just automatically get from the ask block. You can create your own variables X, Y, Z. But in computer and programming, it's best to name things, not silly simple words like X, Y, and Z, but full-fledged words that say what they are, like score. So, I'm setting a score variable to zero. And then any time the trash is touching Oscar before it teleports away to the top, I change the score by one. That is increment the score by one. And what Scratch does automatically for me is it puts a little billboard up here showing me the current score. So if I now play this game once more, the score is going to start at zero. But if I drag this trash over here and even let it fall in, as soon as it touches, the score goes to one. And now if I click and drag again, the score is going to as soon as it touches Oscar going to go to two and so forth. And you saw in the final flourish with Han playing that once you had the sound and other pieces of trash, which are just really other sprites and I just had wait like a minute, wait two minutes so that the trumpet would fall at the right time. I've broken down a fairly involved program into these basic building blocks. And when you too write your own program, that's exactly how you should approach it. Even if you have these grand aspirations to do this or that, start by the simple problems and figure out what bites can I uh bite off in order to make progress. Baby steps if you will to the final solution. Well, let's look at one other set of examples before we have one final volunteer to come up. And as you'll soon see, it's tradition in CS50 to end the first class with cake. So, in a moment, cake will be served out in the transcept. And please feel free to come up and say hi and ask questions if you'd like to. Let me go ahead and open up though a series of building blocks here via which we can make so-called Ivy's hardest game which is one implemented by one of your predecessors, a former classmate from CS50. So here we have a whole bunch of puzzle pieces written by your classmates but let me go ahead and zoom in on this screen. You'll see that this harbored crest is my sprite. So it's not a cat, it's not a trash can, it's a harbored crest and it exists in a very simple two-dimensional world with two walls next to it. If I click on the green flag, notice that with my hands here, I can go up, I can go down, I can go left, and I can go right. But if I try going too far right, I get stuck on the wall. If I go too far left, I get stuck on the wall. Well, it's the sort of the beginning of any animation or game. But how do I do this? Well, let me go up here and propose that the first thing the Harvard sprite is doing is it's going to the middle 0 comma 0. And it's then forever listening for the keyboard and feeling for walls. Now those are functions I implemented myself to kind of describe what I wanted the program to do. And let's do the shorter one first. What does it mean to feel for the walls? Just to ask the question, if you're touching the left wall, change your x by one. If you're touching the right wall, change your x by negative one. Why have I defined touching walls in this weirdly mathematical way? Yeah. >> Sure. Yeah. >> Like counteracts the movement. Otherwise, you're like not moving. >> Exactly. Because if I've gone so far right that I'm touching the right wall, well, I'm already kind of on top of the wall a little bit. So, I effectively want the sprite to bounce off of it. And the easiest way to do that is just to say back up one pixel as though you can't go any further. And same for the left wall. Meanwhile, let me scroll over to the second script or program that's running in parallel. It's a little longer, but it's not more complicated. What does it mean to listen for keyboard? Well, just check. If the key up arrow is pressed, change Y by one. Arrow go up. Else if the key down arrow is pressed, then change Y by negative 1. Key right arrow is pressed, change X by one, and so forth. So again, this is where the math and the numbers are useful because it gives you a world in which to live. Up, down, left, right. deconstructed into some simple arithmetic values. All right, so the net result is that we have a crest living in this world. Well, let's add a bit of competition here. And in the second version of this game, let me go ahead and full screen it again. Click play. And now we'll see sort of an enemy bouncing back and forth autonomously. So there's no one playing except me. I'm controlling Harvard. Yale is bouncing on its own. And nothing bad's going to happen if it hits me. But it does seem to be autonomous. So how is this working? Well, if it's doing this forever, there's probably a forever loop involved. So, let's see inside here. Let's click not on Harvard, but on the Yale sprite. And sure enough, if we focus on this for a moment, we'll see that the first thing Yale does is go to 0 comma 0. It points in direction 90°, which just gives you a sense of whether you're facing left or right or wherever. And then it forever does the following. If it's touching the left wall or touching the right wall, I was a little clever this time, if I may. I just kind of turn around 180 degrees, which effectively bounces me back in the opposite direction. Otherwise, I go ahead and no matter what just move one step. And this is why Yale is always moving back and forth. So, a quick question. If I wanted to speed up Yale and make this beginning of a game harder, what would I do? Yeah. >> Yeah. So, let's have it move like 10 steps at a time, right? This looks like a much harder game, if you will, like level 10 now, because it's just moving so much faster. All right. Well, let's try a third version of this that adds another ingredient. Let me full screen this and click play. And now you'll see the even smarter MIT homing in on me by following my actual movements. So, this is sort of like boss level material now. And it's just going to follow me. So, how is this working? Well, it's kind of a common game paradigm, but what does this mean? Well, let's see inside here. Let's click on MIT sprite. It's pretty darn easy. go to some random position just to make it a little interesting lest MIT always start in the center and then forever point towards the Harvard logo outline which is the name the former student gave to the costume that the sprite is wearing that looks like a Harvard crest and then move one step. So coral layer of the previous question, how do we make the game harder and MIT even faster? Well, we can change this to be like 10 steps and now you'll see MIT is a little twitchy because this is kind of a visual bug. Let me make it full screen. Why is this visual glitch happening? It's literally doing what I told it to do. It just looks stupid. Yeah. Say again. >> Yeah. It's moving so fast that it's sort of going 10 pixels this way, but then I kind of it kind of overshot me. So then it's doubling back to follow me again, and it's doubling back this way. And because these are such big footsteps, if you will, it just has this visual effect of twitching back and forth. So, we might have to throttle that back a bit and make it five or two or three instead of 10 because that's clearly not desirable gaming behavior here. All right. Well, let's go ahead and do this. Let's put them all together just as your former classmate did when submitting this actual homework. Uh, the game will conclude hopefully in an amazing climax where you've won the game. So, we need someone ideally with really good hand eye coordination to play this final game here. Yeah, your hand went up first, I think. Okay, come on up. Big round of applause because this is a lot of pressure to [applause] end. All right. So, if you win the game, cake will be served. If you don't win the game, there will be no cake. >> Okay. But introduce yourself in the meantime. >> Hi, I'm Jenny Pan, freshman at Hollis and I'm actually a CS major or concentration. >> Nice to meet you. Head to the keyboard here. This now is the combination of all of those building blocks and even more aka Ivy's hardest game. You will be in control just as I would of the harbored crest. And the goal is to make it to the exit, which is this gentleman on the right here. And you'll see there's multiple levels where it's each level gets a little harder. All right, here we go. >> [music] >> Heat. [music] Heat. [music] >> [music] >> All right, this is CS50 and this is week one, our second week together. And you'll recall that last week, week zero, we focused on Scratch. Ultimately, this graphical programming language by which you can drag and drop puzzle pieces that interlock together only if it makes logical sense to do so. And many of you had actually probably played with that in like middle school or even prior at some point. But for our purposes, the goals of Scratch were to give us sort of a mental model for some fundamental constructs that we're going to see again and again today in C in a few weeks in Python and even thereafter. And those include things like functions and return variables and arguments and variables and loops and conditionals and more. And so even if today feels like a bit of a fire hose, such as that picture here, appreciate that a lot of today's ideas are exactly the same as last week's ideas, it's just that the syntax is going to change. It's going to look a little different. It's going to look a little scarier. It's going to be harder to sort of memorize, except with practice will come that muscle memory, but the ideas ultimately are going to be the same. And indeed, this is, if unfamiliar, uh MIT down the road has a tradition of hacks whereby students once a year do something fairly crazy. And at this point, they happen to connect an actual working uh drinking fountain to an actual fire hydrant. And the sign there, very pixelated, says, "Getting an education from MIT is like trying to drink from a fire hose." And that's indeed how computer science, how programming, how CS50 will sometimes feel, but realize that what's going to be ultimately most important is not where you uh feel you are day after day, but where 3 months from now you feel that you are relative to last week alone. so-called week zero. So, let's look back at what week zero looked like. It looked a little something like this. The simplest of programs by which we get get that cat to say hello world. Today, that same code is going to start to look a little like this, which was a glimpse we gave you last week. But this time, I've deliberately colorcoded it to try to send the message that whereas in Scratch, we had this yellowish puzzle piece that sort of kicked things off that didn't really do anything itself, but it got the program started, whereas the real work was done in purple here. Same is going to be true today whereby I'm going to wave my hands for a little bit of time at this yellowish code on the screen. But what's really going to have the most effect is this same purple line here and the white text within. And we'll break down what all of these lines mean over the next couple of weeks. But sometimes we'll wave our hand at details if we feel it's a little unnecessary at this point in the story. And in fact, let me get rid of the color coding for now. And we'll see that this is the kind of code in a language called C we're going to start playing with and using today and for the next several weeks. And indeed, it's representative of what we're going to generally call source code. So source code is what programmers write. It's what you write. It's what you wrote, albeit by dragging and dropping puzzle pieces. This week onward, you're going to start using your keyboard all the more. And you're going to write source code. So this is code that we humans can understand with some training and with some practice. But of course per last week, what language do computers ultimately understand? Only >> so binary zeros and ones. And so you and I, yes, can write code starting today in a form that looks a little something like this, which admittedly might look a little arcane and cryptic, but it's certainly better than a whole bunch of zeros and ones. But we're going to write in source code. But the machines that we write code for ultimately only understand these here, zeros and ones, which may very well say hello world, but we're going to call this moving forward machine code. So machine code is what the the computers understand. Only the zeros and ones. Source code is what you and I understand and actually write. So it stands to reason that we're going to have to somehow translate one to the other from source code to machine code. And I alluded to this ever so briefly last week, but we're going to use this same mental model whereby the source code we write might be the input to some problem. The output we want there from is going to be the machine code. So what we're going to equip you with today inside of this proverbial black box is a special piece of software that takes source code as input, produces machine code as output, and that type of program is called a compiler. And there's bunches of difference of compilers in the world. We're going to have you use one of the most popular ones, but it's simply a piece of software that someone else wrote that converts one language to another. Source code, for instance, in a language called C to machine code, the zeros and ones that our Macs, PCs, phones, and other devices actually understand. So, where are we going to do this and how are we going to do this? So, I promised last week that we'd introduce you to this year tool, which I used briefly at the very start of class to whip up that chatbot. We're going to use it though not for Python this week, but indeed for a different language, C. And indeed, this tool, Visual Studio Code, or VS Code for short, is super popular in industry. This is what real programmers, so to speak, are using all of the time nowadays. There's absolutely alternatives. If some of you have programmed before, you might have used or experienced different tools, but this is a very common tool that you'll see even after CS50. And in fact, it's something that ultimately you can install for free on your own Macs and PCs so that by the end of the course, you're completely independent of CS50 and any CS50 related tools. But what we have done for the very start of the class is essentially provided you with a cloud-based version of this tool. So all you need is a web browser on any Mac or PC or the like so that everything's pre-installed for you, preconfigured for you, and you don't have to deal with the stupid technical support headaches at the start of the term because it should just work. But by the end of the term, once you're a little more comfortable with technology and with code in particular, you can absolutely offboard yourself from this tool. Install it, download it on your own Mac and PC and have pretty much the exact same environment completely under your control. So, starting today, you're going to see an interface that looks quite like this quite often. And we used this same interface last week ever so briefly. Moving forward, here's where we're going to write code. At top right is where one or more code tabs are going to appear, similar to any tabbed uh environment that you might use. Here, for instance, is just a screenshot of the first file we'll create today called hello.c. The reason it's called hello.c is because it's in a language called C, as we soon shall see. No pun intended. Meanwhile, the code here happens to be colorcoded, not quite in the same way as you saw before cuz I manually made it look more like scratch blocks. But among the features that VS Code and other programming environments provide is something called syntax highlighting whereby you don't worry about or even think about these colors. But as you write out code in a recognized language, tools like VS Code will just color code different parts of your code for you just to make different features jump out. And we'll see what those features are over the course of today. But you'll also spend a good amount of time, as I briefly did last week, down here in the bottom right of your screen, the so-called terminal window, which is going to be where you run commands for compiling code and writing code. And in fact, as we'll see today, you're going to start using your mouse and clicking a little bit less. You're going to start using your keyboard and typing a bit more. And ultimately, even though if at first that might feel like a step backwards to sort of not use something that's so user friendly, the reality is most every programmer tends to find themselves ultimately much more productive, much more powerful using the keyboard more often, more quickly than say a traditional mouse or trackpad would allow. Meanwhile, we'll see some somewhat familiar features here at left, like this is where you'll see the files and folders that will create over time. At far left here is going to be an activity bar, which is essentially a modern form of a menu via which you can open and close things and access other features. For my purposes, I'll generally hide this part here. I'll generally hide this part here so that when we're together, we're focusing almost entirely on code and commands, but I'm just typing some quick keyboard shortcuts to simplify my own user interface in that way. So, with all that said, just some terminology. So this whole collective environment that I'm describing here is generally what's known as a graphical user interface. Why? Well, it's an interface for users that's graphical in nature with icons and buttons and the like. Shorthand notation for this is guey, GUI for short. But within this graphical user interface, as promised, is going to be that terminal window at bottom right where I promised we would be typing most of our commands. And just to give you a bit more jargon in computing, that's generally known as a command line interface or CLI for short, whereby you're typing commands into that interface instead. And the world of computing software is essentially divided into gueies and CLIs and sometimes a piece of software might have one of each as well. But without further ado, why don't we go ahead and focus entirely first on this here program, which I dare say is the simplest program you can write in a language like C and see how we can actually compile and run it together. So, I'm going to go over to VS Code here where I've hidden my file explorer with all the icons and I've hidden my activity bar so that only do I have room for tabs of code and the command prompt at the bottom. I'm calling this a command prompt because it's at this dollar sign where I'm going to run some of my commands. And it's a dollar sign by convention. It has nothing to do with currency. It's just a computing convention. Some systems will use a carrot symbol. Some systems will use a greater than symbol rather or something else. But it just means type your commands here. The first such command I'm going to type is this code hello. C with a single space in between. I've not used any spaces in the name of the file. I've not capitalized any aspect of the file just because this is convention. Unlike your Mac or PC where you might be in the habit of naming files with spaces and capitalization, generally you'll make your life simpler by just using lowercase and no spaces at all. As soon as I hit enter, what you'll see is that a brand new tab appears called hello C with a cursor blinking on line one. And this is essentially VS code waiting for me now to type the first line of my code. Notice though that the command is complete there by whereby I am have another cursor here which I've give if I give click in the terminal window and give foreground to it my cursor might blink there instead that just means I can type another command when I am ready. So let's go ahead and whip up this code and I've done this many times so I can type it fairly quickly but in this tab I'm going to do include standard io.h h so to speak int main void then inside of so-called curly braces indenting therein by four spaces I'm going to say print f quote unquote hello world back slashn close quote semicolon and voila I've written my first program in C in a class like this no need to write down each and every line of code that I write in fact on the course's website will be copies of everything that we've done as well as excerpts there from in the courses notes but you're welcome but not expected to follow along in real time with what I am typing here. So that's it. Like I've written my very first program in C. If I had done this on an actual Mac or PC without a command line interface, I might have a new icon on my desktop, so to speak, called hello. And ideally, I could double click on that or tap on it and run the program. But because I'm in this specific programming environment that has a mix of a guey and a CLI, I actually need to click down in my terminal window. And I need to now compile this program first because at this point in time, it exists only as source code. So to do this, I'm going to compile my code by very aptly saying make space hello. And I'm pronouncing the space, but literally I hit the space bar. Make space hello as it sort of implies semantically will make a program called hello. Notice I have not said hello.c C again because the compiler, let's call it make for now, even though that's a bit of a white lie, is going to infer that if I want to make a program called hello, it's going to automatically look for a file called hello. C in this case. So, a bit of magic. Enter. And remarkably, anytime you don't see any output at a command like this, that's probably a good thing. Generally speaking, when you see output when compiling your code, you have done something wrong. Or in this case, I might have done something wrong. But no output is good because what I can now do and this is a bit cryptic. I can run this program not by double clicking or tapping anywhere but by doing dot slashh hello with no spaces. And this is a bit weird but what the dot slash means is that a having just made a program called hello that program is going to end up in my current folder. It's somewhere in the cloud. Yes, more on that in a bit. But the program called hello is just somewhere in my current folder. When I say dot slash, that's like saying go into the current folder and run the program therein called hello specifically. Now, as I often do, I'll cross my fingers, hope that I didn't mess this up in any way, and I should see in a second hello world indeed printed onto the screen. And so, just to recap those then commands. One, I ran code hello.c, which is a VS code specific thing. Code short for VS Code just creates a new file called hello.c. And then I'm on my way with my own keyboard. Make hello compiles that source code into machine code thereby creating a new file called hello. And to run that program hello, I type this strange command dot /hello. But this is a paradigm. No matter what you call your programs, we're going to see again and again and again. So even if you've not done something quite like this, it will very quickly get familiar. Yes. Questions. How when you say make hello, how like how does how do you how does the computer know like what part of the code to what part of the code is ascribed to hello? >> Good question. When I say make hello, how does the computer know what part of the code is ascribed to this program hello? It literally is going to take the entire contents of hello.c and turn them somehow into a program. >> And does it have to be like named hello? >> Does it have to be named hello? No. I could have called it goodbye or anything more my first program C. anything at all so long as I change these words here accordingly. >> But it has to like it needs to be like from the same thing like it needs to >> Yes. >> have like green C and make green or whatever. >> Exactly. If you change the name there you need to change your commands accordingly. Other questions on these here steps? No. All right. So let's tease apart what it is we just did and like why this code works in the way that it does. Well, to recap, in Scratch, we had a program like this. When the green flag was clicked, we wanted to say hello world onto the screen. The code that corresponds to that is roughly here. And indeed, notice that the yellowish or oranges code lines up with the when green flag clicked. The purple code here lines up with the say block. And the white code inside of here roughly corresponds to what was in the white oval that we kept using again and again last week. So, let's do more of a onetoone correspondence. And these slides are deliberately designed to give you again that sort of mental model of taking same ideas from last week and just changing the syntax this week onward. So when we have a function like this thing here and recall that a function is just an action or verb. It sort of accomplishes a small piece of work in code in C specifically you're going to type of course not a purple puzzle piece but you're going to say the word print. Well, more technically print f where the f as we'll soon see means format the printed output because this is more powerful than just printing some raw text alone. Then you can have parentheses open and close left and right. And notice that it's no accident that MIT MIT chose an oval for their input to functions because it roughly looks like the start of a parenthesis and parenthesis on left and right. Meanwhile, what goes inside of the parenthesis in the corresponding C code? Well, at the end of the day, minimally hello, world because that's literally what we want to print to the screen. But in C, unlike in Scratch, there's a bit of overhead, a bit of additional syntax that you just got to deal with to make clear to the computer what you want to print. In particular, you're going to have to surround everything you want to print with double quotes to make clear that hello is not some special function or variable or something else. It's hello world is the English phrase that you want to print. So double quote here, double quote there means here's the beginning and the end of what I want to print. You're also curiously going to put a backslash in most cases at the end of the word or words you want to print. We'll take that away in a moment and see what it does. And then lastly, and perhaps most annoyingly in programming circles, you have to finish your thought with a semicolon. Much like in English, you would finish most sentences with a period instead. And the thing in the thing about programming is with C in particular, if you mess up almost any of these details I just rattled off, something's going to go wrong. And so you're in good company. The very first program you try to write or try to compile, odds are it might not work correctly because you'll develop over time the muscle memory for spotting all of these seemingly minor and actually minor details, but that do matter to the computer. All right. So if you're familiar of course with the notation in like mathematics of functions like a function in code is really the same idea as a function in math whereby the function f takes some input for instance x and generally produces some output. So if you're coming more from that background realize that what we're really doing here is roughly the same but in code recall that we can have different types of output. So if this is our grand mental model and say we've got a function as inside of this black box that takes arguments, that is to say as its inputs, it can sometimes have side effects. And recall that side effects are often visual things that happen as a result. They display on the screen. Maybe it comes out of the speaker. It's something generally ephemeral that just happens. But it's not necessarily useful in the same way as another type of function that we'll return to in just a bit. But last week, recall that we got the cat with a speech bubble to uh manifest on the screen and say hello world in that speech bubble when the input was hello world and the corresponding function was instead say. So let's see if we can't now tease apart what the code we wrote is actually doing for us bit by bit. So let me go back to VS Code here and let me propose to break this in a little way. Let me delete the backslash n if only because at first glance who knows or cares what that's doing. Let's just get rid of it if we don't understand it. I could now go back down to my terminal window and I could do dot /hello enter again. But there's seemingly no change, which is good. Doesn't seem like I broke it, but I've kind of misled you here. Why? Why did nothing seem to change? I didn't recompile it. So, recall that the compiler converts source code to machine code, but I already did that a couple of minutes ago. If I've changed the source code, it stands to reason that I need to recompile the code to actually see the effects of that. So, let me do that again. Make hello enter. Nothing seems to have gone wrong, but let me now dot /hello enter. And it's subtle now. And in fact, let me go ahead and zoom in. It's really just an aesthetic bug in so far as functionally the program is still technically printing hello world. But what's seemingly wrong? Or put another way, what did the backs slashn apparently do? Yeah. >> Yeah. So, it's somehow giving me a new line. And that's essentially what the back slashn denotes is give me a new line there. And why was I doing that? Well, really just for the aesthetics. Like if this dollar sign represents my prompt where I type commands. If anything, it just looks kind of stupid that I finished a program over here and then the prompt is on the same line. It just looks wrong. Even though you could sort of argue that was my intent, even though in this case it wasn't. So, what would the alternative be? Well, what you're seeing here is what's actually generally known as an escape sequence, which are sort of uh special sequences of symbols like backslash and n in this case that do a little something unusual. And here's just a non-exhaustive list of some you'll encounter in the real world and including in CS50. Back slashn moves you to a new line. Back slash r is a so-called carriage return. If you've ever seen or used an old school typewriter, this refers to the process of bringing the typing head back to the left end. So it sort of moves the cursor horizontally as opposed to vertically. This one's interesting. Back slash double quote. Why does there exist this pattern? Back slash double quote. Yeah. >> If you just write double quote, it closes the >> exactly. So recall that phrase we tried to type uh print out like hello, world. If for some reason you didn't want to say hello world, but you wanted to say some or like sort of snarkily like hello world or something like that, well, you can't put a quote a quote a quote and a quote and expect the computer to know which quote corresponds to what. It's just arguably ambiguous. So if inside of double quotes, you actually want to print actual double quotes, this is a escape sequence that tells the computer, this is not some quote delim delineating where my thought begins and ends. This is literally a double quote. And we'll see other situations in which a single quote or apostrophe is the same. We'll see crazy situations in which you want to print a backslash, but backslash already has some special meaning. So there's solutions to all of these problems. But let's not get too far into the weeds here. But let me go back to the code and propose what the alternative otherwise might have been. If I didn't know about backslashn, my instinct to move the cursor to the next line might have been literally to just like hit enter or do something like this, like move the double quote, move the parenthesis, move the semicolon on to the next line. But this should start to rub you the wrong way. And indeed, this violates a principle of most programming languages and that most programming languages are linebased. You sort of start and finish your thought ideally on the same line. And this runs a foul of that. And two, even if you're seeing code for the first time, assume that this just looks stupid as well to sort of move part of your thought to the next line, it just looks a little sloppy. And it is. So C and many other languages, Python among them, solve this by giving you these so-called escape sequences. So if you want a new line there, you do back slashn and you will get your new line there. Now, that's a bit of an overstatement what I said in that sometimes lines of code will be so long that they do wrap onto multiple lines, but generally that's a convention that we're going to try to avoid. All right, what else could go wrong? Well, let's do this. Let me go ahead and clear my terminal window, which I can do by hitting uh L or I can literally type clear. And I'm going to frequently do this just to keep the screen clear, even though it has no functional impact. It's just an aesthetic. Let me do something else accidentally. Suppose I forgot to finish my thought and I omitted the semicolon, but otherwise the code is perfect. Let me do make hello. Now enter. Now we're going to see some output that's a little more arcane. Let me go ahead and scroll back up here to make clear that what's just happened is I ran make hello, but I didn't get back to another prompt. I don't see immediately a dollar sign because there's an error message here that is almost as long as the code I tried to write. Not to worry. Let's see. Here is the name of the file in which the problem exists. Stands to reason that it's in hello C. Here is the line uh number in which the problem seems to exist. Line five. And that's helpful because it lines up with this. And then if you're you care to count, this is the 29th character. So if I count from left to right around character 29, something is wrong. Something is missing. So it's a pretty decent error message. In fact, it even says expected semicolon after expression. There's a little green carrot symbol pointing me at the mistake. So this is an again a this is another value of the compiler. Not only will does it know how to convert source code to machine code, it's also pretty good at finding mistakes in your code and trying to draw your attention to them. So how do I fix this? Well, assuming you've understood the error message at this point. Well, you just go back in, add the semicolon. Let me go back down to my terminal window. I'm going to clear it just to clean up the mess. Let me rerun make hello. And now we are back in business. And indeed, if I do /hello, I've got hello world back on the screen. Well, let's make one other mistake. Suppose that I forgot, as you sometimes will, to include this line at the top, which will make more sense next week, but for now, let's just omit it and dive right into the code. You would think this is enough, just printing out hello world. Well, here, let me go back down to my terminal window. Let me do make hello again now. And I'm going to get a whole different error message instead. So now problem is still with hello C. That makes sense. Line three. Okay. So somewhere in there print f is suddenly the problem even though the semicolon is back and the back slashn is back. So let's keep reading. Error call to undeclared library function printf with type int. And then this is a whole mouthful. So, here is an example of an error message that unless you're sort of conditioned to know what this means and you've seen it before, it's quite more cryptic and unclear like what the solution to the problem is, especially when the rest of your code is truly correct. I've just forgotten something stupid. But how can I sort of think about this problem? Well, it turns out that another feature of C is that it comes with a bunch of header files. A bunch of files whose names don't end in C, but end inh. And these so-called header files which end inh are contain code that other people wrote that you can use in your own programs. So for instance in this particular case a header file is giving us access to what's more generally in computing called a library. A library is code someone else wrote that you can use. And I actually used a library last week when I did that import line and mentioned open AAI the company. I was actually using a library from that company that I had automatically downloaded and installed into my programming environment in advance of class because I don't know how to implement a chatbot without standing on their shoulders and using a lot of the code they themselves wrote. Same idea here. Even though print f is a feature of C, if you want to use it, you have to include that library by telling your program to include the header file that defines that function. And you only know this by being taught it or looking it up in a book or a reference. But in this case, I wanted to use a header file called standard io.h stdiodio.h. Um, it is not studio.h. This is a very common bug online. Um, if you find yourself typing studio.h, typo, it's standard io.h. And in that file then is defined the printf function. So, if I go back to my code here, the solution to this problem truly is to just undo the deletion I made a moment ago. Because what line one is now doing for me is it's telling the compiler, oh, by the way, I didn't write all the code that I'm about to use. Please include the definition of print f from this other file called standard io.h. And again, you'd only know this by looking it up in a reference, attending a lecture or something like that. It's not obvious otherwise, but these are the kinds of things you very quickly look up. So, where do you look them up? Well, it turns out the ecosystem of C has, you know, hundreds of books you can buy or download, many, many, many websites. Among them is one of CS50's own. And in fact, the conventional way to look stuff up for the programming language called C is to look at the official manual pages or man pages for short for the C language. Unfortunately, many of them were written decades ago and they were certainly written by fairly advanced programmers and not for a broad audience. And so what we have done is imported all of that freely available documentation uh hosted it at our own URL here manual.cs50.io and we've essentially simplified it for those less comfortable those of you who might be less familiar with less comfortable with technology and really for most people who aren't used to reading manual pages. It's just useful to have it written in teaching assistant like language instead. So for instance if you go to a URL like this you'll see CS50's documentation for this official library standard io.a H that comes with C itself. If you get a URL like this, you can look up the documentation for print F itself specifically. So for instance, let me go ahead and just give you a teaser for this. If I were to do the same on my own computer, I might see the CS50 manual pages here and you'll see header file by header file a bunch of frequently used functions in CS50. We've also filtered the list down from a massive list to much shorter list so that you can sort of see what's most likely useful to you. If you go to a specific page like standard io.h, you'll see for instance here just over a halfozen functions that we won't touch on today beyond print def, but that we'll see in the class over time that does useful stuff. For instance, printf prints to the screen. And we'll see other functions for opening files, closing files, and the like because all of that's related to standard IO input and output. If I go to a specific man page for uh this uh header file, you'll see the standard formatting for these pages. So, here's the name of the function, print f, and it prints to the screen. You'll see a synopsis, and this indeed indicates we're in less comfortable mode. If you want to see the original, more arcane documentation, just uncheck that, and you'll see the original official documentation, but you'll see a mention of like what header file this function is defined in so that you know what file to use in your own code. You'll see a so-called prototype, which is just the first line of code from that function. More on that in just a little bit. You'll see an English description. You'll see example code. Long story short, this is the authoritative answer. And even though you have access in this class to the virtual rubber duck at CS50.AI and in other forms of it that you'll soon see, you should also have the tendency and the in instinct moving forward to check the official documentation. And all of today's AIS are trained on things like the official documentation. So that's the source material that any of these AI, the ducks among the duck among them are actually relying on. But what we're also going to see is that besides these official functions, there's some that CS50 itself has invented. We use these really as training wheels for just the first few weeks of the course and then we take these training wheels off. But the reality is in a language like C, certain stuff is just really hard or annoying to do. Certainly if you're learning how to program for the very first time or at least you are new to C. We'll eventually show you how to do it that way. But even if you just want to get input from the user like a string of text or a number of some sort, it's generally not that easy to do in C, at least in these early days. So for instance, at this URL here, you can see documentation for CS50's own library and CS50's own header file, CS50.h. And you'll see such functions in the documentation as these get string, get int, get char, and a bunch of others as well. And we'll touch on those this week. But it will ultimately be a way of just getting useful work done quickly by standing on our shoulders and actually uh using functions we wrote to then solve problems of interest to you. So let's focus for instance on one of these first. Get string. A string in programming speak means text. Zero or more characters of text like h e l l o comma space w o r l d. That is a string of text in computer speak. And it's obviously not a number like 50. It's actual text that you would type on the keyboard. We'll see then what other things we want to get. But with this pro this function, we can start to replicate another program that we implemented pretty quickly last week in Scratch. So recall that in Scratch, this one was a little more interactive. I used another blue puzzle piece ask to actually get input from the user. And recall that unlike the print defaf function today and the say block last week, this time we still have the same input output model, but if we pass in arguments to a function uh that we're about to see, you can get back not just a side effect sometimes, but a return value like a useful reusable value like the person's name as we'll soon see. All right, so let's actually do this. If in Scratch the equivalent was asking the user, what's your name? asking them that and then waiting for an answer that we can store in a variable. Let me propose that in C side by side it's going to look a little something like this. Instead at left we have the scratch block the ask function here is the argument there too and then it and wait just means it's going to wait till the user finishes typing. If I want to translate this to C now today moving forward well it looks a little something like this. The closest analog in C thanks to CS50's library is going to be a function called get string. So there's no C function called ask. And we deliberately named this function get string just to make super clear what it is you are getting. A string of text in this case. And we've got the parenthesis ready to go indicative of this white oval for user input. If I want to prompt the user with that same phrase, what's your name? Well, I can just put it inside of those parenthesis. But what next do I need to add around my user input? Um, you did the quotation marks. >> Yeah, I need the quotation marks just to make clear that these aren't special individual words. This is a whole phrase that I want to be displayed to the user. So, I'm going to indeed put double quotes around everything. And this is just an aesthetic. I don't in this case want to bother moving the cursor to the next line. Like, I want the user to see the question and I want the cursor to just stay there blinking waiting for their prompt. But I don't want the cursor to be right next to the question mark. So, I'm deliberately just leaving a single white space there just to kind of scooch it over a bit so it looks a little prettier, at least to my eye. Now, we're not done yet because we need to do something with this value. The get string function, as we'll soon see, is going to prompt the user for me to type something in like my name. But where do I want to put that? Well, MIT has the answer put in a variable called answer. And you can't rename that in Scratch. It's just defined as answer. But in C, what I'm going to need to do is something like this. If you want to keep return values around from a function, you literally use an equal sign and then to the left of it, you put the name of the variable into which you want to put that return value. So in mathematics, we would use X, Y, and Z as our variables. Again, in code, as in Scratch, you can name your variables anything you want. By convention, they should usually be lowercase. They should not have spaces therein, similar to file names. But this is a pretty good analog now of what's going on collectively here. But C is a little more precise. It you can't just give the variable a name. You need to tell C or really the compiler what type of value you want to put in this variable. So if it's a string of text, you put string. If it's a number, you're going to put something else. But for now, it's a string. Per the function's name, it's going to give me a string. Now, we're so close to finishing this comparison. There's one detail missing. What's still missing from the code here? Yeah. >> Yeah. So, we have to finish the thought lastly with a semicolon. So, if you're getting to sort of the point already, like this is one of the reasons why we start with Scratch, you sort of you get the intuition pretty quickly. And even though nothing on the right hand side is particularly hard, there's just all these stupid little details that you have to ingrain in yourself over time. In this case for C, but for many programming languages, we're going to see the similar paradigm. But among the goals of the course too are to show you how ultimately languages have been evolving. And so one of the things we'll see in Python in a few weeks time that some of this syntax actually goes away because over time humans have gotten annoyed at older languages like this. Like why the heck do I have to keep putting a semicolon when it's clear that I'm at the end of the line. So we'll see among languages like Python we can get rid of some of these same features. But for now it's just a matter of remembering what goes where. All right. So, let's go ahead now and take that same idea of converting Scratch to C and actually do something with this code. Let me go back to VS Code here. I'm going to keep my file name the same, but what you'll see on CS50's website is that we'll add version numbers to each of the examples that I'm typing out. So, you can actually see the progression of these programs, even though we're not changing the name. And what I'm going to go ahead and do here, for instance, in hello C this time, is the following. I'm going to go ahead and uh first get rid of the single hello world. I'm going to go up here and include this time cs50.h. So, not one but two header files. And then inside of my curly braces, inside the so-called main function, as we'll soon call it, I'm going to go ahead and do this. Exactly the same line of code as on the screen before, I'm going to get a string prompting the user for what's your name question mark space close quote semicolon. And as an aside, this will will soon see print on the screen what's your name. So that implies that the get string function is actually using print f itself to print out that message. I do not need to use print f to display that message on the screen because I read the documentation for CS50's get string function and I just know that it is using print f for me to achieve that particular goal. Now let me do something intuitive but not quite correct. If I want to print out that answer so that the expression is going to be not hello world but hello David or hello Kelly. Let me go ahead and say hello, answer back slashn to move the cursor down as before. semicolon. So this is not quite right. And even if you've never programmed before, you can perhaps see where this is erroneously going. Let me remake the program because I've changed the source code and I need new machine code. Nothing seems to be wrong aesthetic uh uh logic rather syntactically. But if I do now dot /hello and hit enter, you'll see I'm being prompt. What's your name? So I'm going to go ahead and type in David and then hit enter. But when I do, if you know where this is going, what am I going to see instead? >> Hello answer. And the computer's just doing literally what I told it to do. I said quote unquote print out hello answer. But obviously that's not the goal that I have in mind. So how do I actually work around that? Well, what I really need to do is achieve the equivalent of this thing here, which we did by stacking blocks in Scratch or nesting them, if you will, one inside of the other. So, I want to join the expression hello, space, and that answer. And it turns out in C, you can't do it quite like this. Like, there isn't an analog of the join function, at least that we'll see today. So, we have to do this a little bit differently. We can do it though by maybe telling the computer, we'll go ahead and print out hello, comma, space, and then maybe we can give it like a placeholder to plug in the name once we know the name. Because when I'm writing my code, I have no idea who's going to play this game, me or Kelly or someone else. So, what if we use special syntax to indicate where I want the person's name actually to go? Let me propose that we now do this. instead of printing out hello quote unquote uh hello comma answer quote unquote let's go ahead and start printing out something and I got my parenthesis ready to go and I did my semicolon in advance this time I want to somehow now say hello placeholder and you would only know this by someone having told you or a reference online percent s is the placeholder for a string that you don't know when you're writing the code but when someone else is running the code it will be filled in and substituted for other input. So, hello, percent s is the closest we can get to this. I still need though some other syntax. I still I do need those quotes on the left and the right just to be uh aesthetically pleasing. I'm going to put a back slashn there at the end to move the cursor, but now I've left room in my parenthesis for one more thing. And you can perhaps guess where I'm going with this. Again, even if you've never programmed before, this is telling print f print out h e l o comma space something. What should I probably pass in to these parentheses as a second input so that print f knows what that something is? Yeah, >> the variable. >> The variable name. So the variable in which I have the user's name and indeed the convention is to put a comma after the quotes and then the name of the variable that has the value you want to be substituted for that placeholder. Now notice there's a collision of syntax and grammar here. The comma inside of the quotes is just an English thing. Hello, comma, so and so. The comma outside of the quotes is meaningful to C because it delineates which is the first input or argument to left and which now is the second. And we haven't seen this before in C. Up until now, we've only been passing one input, but you can pass in two or three or four. Completely depends on what the function is designed to expect. So, let me put this all together now. Let me go back to VS Code. Previously, we were literally printing out answer, but I can change answer to percent s. I can move my cursor outside of those quotes, comma, answer, because that's the name I gave to that variable. I can go back down to my terminal window and clear it just to reduce clutter. Let me do make hello one more time. Seems to work. Dot /hello. Enter. DAV ID. And now hello, David is printed. Okay, questions on any and all of that. >> I was wondering with the header file, where is it pulling from? >> Good question. Where is it pulling these header files from? So, what you are seeing here is a graphical user interface that's somewhere hosted in the cloud at cs50.dev, the URL I mentioned last week, and we're going to tease this apart in just a moment. That software is running on a computer, and that computer's got a hard drive or a solid state drive, like folders of storage. Those files, CS50.h and standard.io.h age and many more are pre-installed on the server to which I have connected and they're stored in a standard place so that the compiler in particular knows where to look for them and those are all things we did in advance for you. Yeah. >> Why is back slashn not create a new like a new line? >> Why does the back slashn not create a new line? So it is back slashn is essentially being printed here which has the effect of pushing the dollar sign to the next line. Otherwise, the dollar sign would stay on that second to last line. Other questions? >> Why is there no backslash on this? >> Good. Uh, why is there no backslash and over here? >> Good question. My choice as the programmer. I just wanted to see the sentence, what's your name? And I wanted the user me to type my name immediately after it like this. But I didn't have to do it that way. I just wanted to show you the difference. >> Gotcha. And then also like just generally when we're like doing the work should we always write the like first four lines. >> Should you always write the first four? Oh these. Yes. For today trust me do this, do this, do this, do this. And next week we'll understand even more what those lines do. However, slight caveat only use cs50.h if you're using one of our functions. Clearly you don't need cs50.h if you're just printing something out as in the first example. Other questions? is dividing the first input and the second input. I understand that the second input is what I type as the user. The first input doesn't really feel like input for me because that's the question that you asked. Can you like explain a little bit why both say input? >> Correct. So to to summarize the question on the right here, this input is effectively provided by the user. This first input though is provided by me. That's the way it is. So uh these are both inputs because they're being provided as inputs to the function. The origins of those inputs though are entirely up to what I'm trying to achieve. The first one I know in advance like I'm the programmer. I know I wanted to say hello, someone. The second input I don't know in advance. So I'm using a place I'm using a variable to store the value that I'm going to get when the get string function is used later on. But they're both inputs even though they're used in different ways. Good question. Any others? No. Okay. So, if we now have that done, well, let's just take a step back into the first question that was just asked about um where are these files? Let's take a look back at actually what it is we're actually using here. So, it turns out even though most of you are using Mac OS or Windows, there's other operating systems out there in the world. Phones have iOS. Uh iPads have iPad OS. Uh Android devices have Android, which is its own operating system. The operating systems in the world are the pieces of software that really just do the most fundamental operations on a device like booting it up, shutting it down, sending something to a printer, displaying something on the screen, managing windows and icons and all of that sort of commodity stuff that is used by other people's software as well. A very popular operating system in the programming world and in the world of servers in the cloud and on the internet at large is called Linux. And it's a descendant of something called Unix um which has been around for quite some time and it's what many programmers most programmers um use depending on their environments in so far as Linux is very highly performant like you can support thousands of millions of users on servers running an operating system like this. It tends not to but it can have a graphical user interface which just means it can operate more quickly because it doesn't need all of these graphics that are really just for humans benefits not necessarily for web browsers and other devices. And Linux in so far as it's usually used or often used as a command line interface comes with a whole bunch of commands that you'll start to use and see over time. Now I've used a bunch of commands already. I've used code which is a VS code thing. I have used make which is for today's purposes our compiler but that's a little white lie that we'll distill next week. Uh and then I've used dot /hello which is a command I essentially invented as soon as I created a program called hello. But there's a bunch of other ones as well. For instance, if I want to list the files in my current folder, I can type ls and hit enter for short. If I want to uh create a new folder, otherwise known as a directory, I can use mkdir to make a directory. If I want to remove a directory, I can use rm directory. If I want to remove a file, I can use rm. If I want to rename a file, I can use mv for move. If I want to copy a file, cp. If I want to change directories, change into a folder, I can use cd. Now, these two just take a little bit of time and practice to memorize them, and they're all very tur in so far as the whole point of a command line interface is to let people navigate things quickly. So, for instance, even though this will be a bit of a whirlwind, let me go back into VS Code and let me propose that we play around with just a few of these commands so that you've seen me doing it, but generally speaking, in CS50's problem sets, we will tell you step by step what commands to type so that you can achieve the same results. And then later in the term we'll stop bothering reminding you pedantically how to do uh this and that because it should come more naturally eventually. But for instance let me go ahead and do this. Let me go ahead and reopen my file explorer at left. Yours will look a little different. You'll have a different number as your unique ID but generally you'll see whatever files and or folders you've created already. The first thing I created today was called hello.c. And then by using make I created a second file I claimed called hello. So the reason hello works is because there is in fact a program called hello in my current folder ergo the dot that was created when I compiled my source code into machine code. Now suppose for the sake of discussion that this is going to get messy quickly because the more programs we create in class and for problem sets, you're just going to have a hot mess of files inside of this one main folder. Well, let's create subfolders like you might be inclined to do on your Mac or PC or Google Drive or whatnot. Well, we can do this in a bunch of ways. I could rightclick or controll-click on my file explorer, and I'll see a somewhat familiar uh contextual menu, and I can literally choose new folder, or I can rename things, or I can move things around by dragging and dropping them. But for today, let's focus more on the CLI, the command line interface. And again, commands like this. So, let me go back into VS Code, and let me propose that we do a few things just because as a tour. First, let me delete the machine code. I I've I'm done with this example. I don't really want to keep these bits around unnecessarily. I'm going to delete hello. Not hello.c, but hello. The compiled program. When I type that, I'll be cautioned. Remove the regular file, whatever that means, called hello. Here, I'm being prompted for a yes no response. Y suffices. So, I'm going to hit Y, enter, and watch what happens at top left. As soon as I use my terminal window and this command to remove that file, it disappears. I could have rightclicked on it or control-cllicked on it, but this command line interface achieves the same thing. Now suppose that for problem set one in future problem sets, I want to keep like every program I write in its own folder just to keep myself organized, especially as the term progresses. Well, let me create a new folder called hello itself. So I don't want to create a program called hello. I want to call create a folder called hello. Well, one way I can do this per this here cheat sheet is to make a directory which just means folder. So, mkdir hello. Enter. And you'll see at top left now I indeed have a folder. And it even has an obvious folder icon next to it. Now I could cut some corners. I could click and drag on hello.c and just drop it into hello. But again, let's stick with the command line interface. Let me go ahead now and move mv for short. Hello. C into hello. So this is the first command where I'm passing in not one word after the command like code hello. see or make hello. Now I'm typing two words after the command because the way the move command is designed is to expect the origin as the first word and the destination as the second so to speak whereby if I want to rename hello C sorry if I want to move hello.c into the hello folder I should type like this. Now, you can, just so you know, include a trailing slash, a forward slash at the end of the destination just to make clear that you want to put this into a folder and not just rename hello.c to hello. But because the hello folder already exists, Linux knows what it's doing. And it's just going to assume that when you do that, watch what happens at top left. Hello. C seems to have disappeared. But if I click this little triangle, ah, there it is. It's now inside of that folder. But now I've created kind of a predicament for myself. Let me clear my terminal window. And now let me type ls. And when I type ls for list, you'll see only a folder called hello. And it's colorcoded just to call it out to your eyes. And there's a trailing slash just to make obvious that it's a folder. That's all done automatically for you by Linux, the operating system. But wait a minute, where did my hello program go? Like where is hello. C. Well, it's in that folder. So I need to change into that folder or directory. And here per the cheat sheet, we have cd for change directory. So, I can do cd space hello with or without the slash and hit enter. And now you'll see this. And it's admittedly a little cryptic, but my prompt has now changed to still be a dollar sign, but before it is just a constant reminder of where what folder I am in. We uh adopted this as a convention. Many systems do the same thing, though the formatting might be a little different. This is just to help you remember where the heck you are without having to type some other command to ask the operating system what folder you are in. So now that I'm here, if I type ls and hit enter, what should I see? Just hello. C because that's the only thing in that there folder. So now let's do maybe one other thing. Let's do make hello inside of this folder. That is okay. And notice at top left what just happened. Now I've got both files back. All right. Suppose I want to get rid of one. Well, I can do rm hello again. I can type y for yes to confirm the deletion. And now I'm back to where I just was. Now suppose I want to do yet other things. Suppose that I'm not really proud of this version of hello. C. Let me keep it but rename it. Well, I can say uh how about MV hello C to old C. I just want to rename the file. So MV can be used not only to physically move a file from one place to another. If you use it onto file names, it will just rename the file for you. So there's no rename command that you need use instead. Uh but you know what? Nope. I regret that. This program was fine. Let's rename it back. So, let's move old C back to hello. C. And watch it. Top left. It just renames the file again. Um, let me go ahead and make a backup though. So, let me copy with CP hello. C into a file called like backup.c just in case I screw this up. I want to have a spare around. Now, you see at top left, I've got both files. If I now type ls, you'll see both files. So, what's happening in the guey is the exact same thing is happening in the CLI. But, you know what? This was just for demonstration sake. I don't need any of this. So, let me remove the backup. say yes for y. Let me go ahead and move hello.c out of this folder, which I could just kind of drag and drop it. But how do I move hello C to the parent folder, so to speak. I want to move it out of this folder. Well, you would only know this by having been told dot dot is special notation. That means the so-called parent folder. So, go back up in the hierarchy. And now, if it's not obvious, a single dot, which we have seen before, means this folder. Two dots means one step up. There's no triple dots or quadruple dots. You have to use different syntax, but more on that another time. So, watch what happens when I do move hello.c up into the parent directory. Notice at top left that the indentation changed because it's no longer inside of that same folder. And heck, now I'm going to go ahead and do this. I could go back to my main folder by doing cd dot dot to back out of this folder. But when in doubt or if you ever get yourself into a confusing mess, just type cd enter alone and you'll be magically whisked away to your default folder, a home directory so to speak, even though that too is a bit of a white lie. So that will lead you always where you're starting when logging in to c50.dev aka VS Code. And now I can see the folder which happens to be empty and the file. So let me go and do one last command rmder. Hello to really undo all of the work such that we're now back to where the story began. But the point here is just to demonstrate with that with these basic fundamental commands, you can do everything that you've taken for granted on Macs and PCs for years with a mouse instead. Questions on any of these here? Linux commands. Yeah. >> Files in a folder, how can you like to open? >> Really good question. If you have five different f files in a folder, how can you choose which one to open? Well, you can certainly do code space and the name of the file you want to open. Or we're going to see other tricks like you can use an asterisk or star for a so-called wild card and say open everything in this folder. And you can even use more precise patterns than that. So over time once we have more files at my disposal, I'll be able to do tricks like that as well too. Yeah. >> I don't know if I said it back. >> Uhhuh. when you like delete the file was that hello was that hello. >> Sure. So one of the things I did in my VS code a moment ago was once I was inside of the hello folder into which I had put hello.c just for the sake of discussion. I then recompiled it by running makehello. And this example is a little confusing deliberately in so far as I've got a file called hello.c C inside of a folder called hello. But because I compiled hello.c, I then created a program called hello as well. But that program hello was inside of a folder called hello. Which is only to say that you can totally do this. You can't have a file in a folder in the same place named the same thing because they would collide. Like you can't do that on a Mac or a PC as well. You have to have unique names. But you can certainly put something inside of another folder without collision. Good question. All right. So let's introduce a few more building blocks and a few more things we can do. So besides these Linux commands which we'll now start taking for granted, we have a bunch of other features of of programming languages that we saw in Scratch. Let's now translate them to C. So conditionals were sort of the proverbial fork in the road enabling you to do this or this or some other thing based on the answer to a question, a so-called boolean expression. Here for instance in scratch is how we might express if a variable x is less than a variable y we'll go ahead and say x is less than y and out of context I didn't include it in the slide presumably we've created x and y and somehow given them values whatever they are but this is just now the conditional part of the program in C the way you would do the same thing is you would say if and then a space then parentheses which have nothing to do with functions if is not a function it is a feature of C that implements conditionals just like this orange block is a feature of scratch inside of the parenthesis you put your same boolean expression. So here too out of context if up here I have defined variables X and Y well I can certainly use them in this conditional and I can use this less than operator just like in math class to ask this question and the answer even though it's a less than sign is indeed if you think about it going to be true or false yes or no. It's a boolean expression. It either is less than or it is not. All right. Inside of the curly braces which are necessary here I'm just going to literally put our old friend print f. And there's nothing interesting here except the new phrase x is less than y with the backslash end the semicolon and the parenthesis. This though is deliberate just like in Scratch the say is sort of indented and sort of hugged by the if orange puzzle piece. Similarly do these curly braces are they meant to sort of imply the same. It's sort of embracing these lines of code. As an aside in C they're not always necessary. If you have a single line of code you can technically omit them. However, what you'll see in C as in as well as in CS50 in particular, we will generally preach a certain style like any company in the real world would do so that programmers who are collaborating on code all write code that looks the same uh so that it doesn't uh devolve into a mess because everyone has their own convention. So this is a convention to which you should indeed it here and then I've indented four spaces to make clear logically that this line of code only executes if the answer to this question is true or yes. Meanwhile in Scratch if we had an if else condition so a two-way fork in the road. If x is less than y say so else say x is not less than y. How can I do that in c? Well if x less than y something else something else. And what are the uh what's goes in between those curly braces? Well, just two different print fs. X is less than Y or X is not less than Y. The only new thing here is we've added else and another pair of curly braces, just like we've got sort of two uh orange uh shapes hugging those two purple puzzle pieces there. All right, how about something a little more involved? And this looks like it's escalating quickly, but it's just because the scratch puzzle pieces are so big. If x is less than y, then say x is less than y. Else if x is greater than y, then say x is greater than y. else if x equals y then say x is equal to y. How can we do this and see almost the same idea. If x less than y else if x greater than y else if x equals equals y. Well before we reveal what's in the curly braces. This is not a typo. Why have I presumably done this even if you've never used C before. Yeah. >> Exactly. The single equal sign, which we've used already when storing a value from get string into a variable like answer, is technically the assignment operator. So humans decades ago decided that when faced with the situation where they wanted to copy from the right to the left a return value into a variable, it made sort of visual sense to use an equal sign because you want those two things ultimately to be equal. Even though you kind of read the code from right to left in that case, I can only imagine at some point the same people were in the room and they were coming up with the syntax for conditionals and like oh shoot we've already used equals for assignment. What do we now use for equality and the solution in C as well as in many other languages is literally this. They use two. So this is the equality operator whereas a single one is the assignment operator and it's just because now Scratch is designed for kids. No sense in confusing little kids with equal equal signs. So, Scratch uses a single equal sign, whereas C and most languages use double equal sign. So, a minor divergence there. What goes in the curly braces? Nothing all that interesting, just a bunch more print fs. But here's an opportunity to distinguish not only the equivalence of this scratch code with CC code, but a misdesign opportunity that we sort of tripped over if briefly last week. This is arguably not well designed even though it is correct. Why? Yeah, >> you don't need to ask. >> Yeah, we don't need to ask this third boolean expression. Is X equal equal to Y, so to speak? Well, logically, if we're using sort of normal person numbers, it's either less than or greater than or by default equal to. So, you're just wasting the computer's time and in turn the user's time by asking this third question. So, slightly better here would be get rid of the else if just have a default case, an else block so to speak, that looks like this. if it stands to reason that there's only three possibilities, you only really need to interrogate two of them out of the three. So, a minor optimization, but you could imagine doing that again and again and again in your code. You don't want to be wasting the computer or the user's time if you can improve things like that. All right. So, now that we have these equivalences between Scratch code and C code for these conditionals, well, what other things can we throw into the mix? Well, uh C has a whole bunch of operators. And just so that you've seen a list in one place, you've got not only assignment and less than and greater than and equality, but a few others here as well. Now, even though in like Microsoft Word, in Google Docs, you can kind of do a greater than or equal to sign one over the other or less than or equal to, in C in most languages, you actually just hit the keyboard twice. You do the less than and an equal sign, or you do a greater than and the equal sign. And that's how you achieve the notion of greater than or equal to or less than or equal to. Um, this one we've seen. Anyone want to guess what uh exclamation point equals means? Otherwise pronounced bang equals. Yeah. >> Not equal. So generally in programming you'll see an exclamation point implying the negation of something else. The opposite. So you don't want it to be equal to, you want it to be not equal to. Now you might think, shouldn't it be not equal equal? Yes, but they're trying to save keystrokes. So this is the negation of that even though it doesn't quite look like it should be. just two characters instead of three. Um, and dot dot dot there's many other operators that we'll encounter in the wild over time. Um, but there's also worth noting in C more than just strings like strings recall were strings of text and there's other types of uh data that you might get from a user or store. We've seen string but we'll actually see a whole bunch of others. So in C we're going to see bools themselves a a variable that can be true or false and that's it. So very much interrelated with boolean expressions. A variable itself can be true or false. We're going to see chars or characters. So not strings of text like multiple letters and words and the like but just individual characters. C unlike some languages does distinguish between single characters and multiple characters. Uh double or rather let's jump to float. A float is otherwise known as a floatingoint value which is just a number that has a decimal point in it. a real number if you will, but a float generally uses nowadays 32 bits total to represent those numbers. The catch with that is that how many total values can you represent with 32 bits roughly per last week? It was one of the few numbers I propose you remember. It's like roughly 4 billion. But how many real numbers are there in the world according to math class? An infinite number. So we seem to have a mismatch between what we can represent in code and how many actual numbers there are in the world. Okay, so not to worry if you need more precision like more significant digits. Well, you can upgrade your variable so to speak from a float to a double which uses 64 bits which is way more precise twice as many bits but it doesn't fundamentally solve the problem because really it's still finite and not infinite. And we'll end today with a look at what the real world implications of that are. But besides floatingoint values, they're just simple integers. 0 1 2 and the negatives thereof. Uh but those conventionally use 32 bits, which means the highest a computer can count using an int would be 4 billion. But if you want to do negative numbers, it's going to be roughly 2 billion. So you can go all the way to negative 2 billion. So that's not that large nowadays. Along uses 64 bits, which is a much bigger range of values, but there too still finite. And there's a bunch of others as well. So these are just the types of data that we can store and manipulate in our programs. But a couple of those know do uh couple of those one in particular specifically come from cs50.h. So among the things you get by including cs50.h in your code is access to not only get string but these other functions as well. And we'll start to use these in a little bit whereby you can get integers or chars or doubles or floats. We don't have a get bool cuz it's not really useful to just get a true or false value typically, but we could have invented it. We just chose not to. But we'll frequently use these here functions that you can access by using that there header file. But where are we going to put these values and how are we going to display them? Well, turns out there's more than just percent s. So percent s was a placeholder for a string, but if you want to print out something like a char, a single character, you're actually going to use percent c. If you want to print out a floatingoint value, you're going to use percent f. An integer percent i and a long integer that is a long, you're going to use percent li instead. So in short, there's solutions to all of these problems. These are not uh intellectually interesting details, but they are useful practical things to eventually absorb over time. So let's go ahead and do this. Let's do just a few more examples together. In a little bit we'll journey and we uh for a short break uh during which uh snacks will be served every week out in the transep. But before we get to that, let's uh focus on these here variables. So in Scratch we had the ability to store a bunch of values in variables that we could create ourselves by creating new puzzle pieces. In C you can essentially achieve the same. So for instance suppose that in Scratch we wanted to keep track of someone's score using a counter. Well, we might create a variable called counter and set it initially to zero and then eventually add one to it, add two to it, and so forth as they drop trash into the trash can, for instance. Well, in C, you're going to do something almost the same. You can choose the name of your variable just like I did previously with answer. You can assign it a value like zero initially, but per earlier, what more am I probably going to have to do in C on the right hand side here? Yeah, >> I got to give it a type and a counter in. in so far as it's numeric is not going to be a string of text and I don't think I need to worry about decimal points if I'm just counting the equivalent on my fingers. So int will suffice and int is the go-to number and le at at least if two billion plus values is more than enough for your case which this is going to be still one minor thing missing. Yeahm >> the semicolon to finish the thought. So that on the right is the equivalent to doing this here on the left. Suppose that in Scratch you wanted to increment the counter and add one to the score, add two to the score and so forth. It might look like this. Change counter by one implicitly going up unless you did negative which would go down. In C, you can do this actually in a few ways. And this looks a bit wrong at the moment. How can counter possibly equal counter + one. This does not mean equality per se. The single equal sign recall is assignment and it means take the value on the right and copy it to the value on the left or to the variable in this case on the left. So this takes whatever the current value of counter is zero adds one to it and then stores that one in the counter variable. So now the value is one and if you do it again it goes to two goes to three goes to four and so forth. But honestly this incrementation technique is so common that there's more shorthand notation for it. You can also just do this. Looks a little weird at first glance but counter plus equals 1 semicolon does the exact same thing. You can just type fewer keystrokes. And honestly, doing this is so down common in C that you can even do this counter plus plus does the exact same thing by adding one to the variable. There's no plus+ or plus+ or more pluses. It's only for incrementing individual values by one. So arguably this version and this version, albeit more verbose, are a little more versatile because you can add two or three or more at a time. And there are equivalents for you doing decrementation and doing minus minus or the minus symbol more generally in there. All right, so let's actually use this technique in some code. Let me go back into VS Code here. Let me close my file explorer and let's go ahead and create maybe this time like a a little calculator of sorts. Let me propose that we implement a very baby calculator or rather not even a calculator yet. Let's just compare some few values. So let me do this code of compare C to create a brand new program called compare. And then in here I'm going to do a bit of boilerplate. I'm going to go ahead and include cs50.h. I'm going to go ahead and include standard io.h. And I'm going to go ahead and uh do int main void. More on that next week. And then inside the curly braces, let's use these these new techniques. Let's give myself a variable called x and set it equal to the return value of get int. that other function I promised exists. And let's prompt the user for a value for x with a sentence like what's x question mark and then a space just to nudge the cursor over. Let's get another variable y. Set it equal to get int again and ask the user this time what's y essentially using the same function twice but to get two different values. Now let's go ahead and do something pretty mindless. If x is less than y, go ahead and print out with print f x is less than y. Back slashn to move the cursor close quote semicolon. So it's not that interesting of a program, but it's at least dynamic in that now I'm prompting the user for two numbers. So let's do this. Make compare. Enter. Seems to have worked. And in fact, I can check that it worked by typing what command to list the files in my directory. ls for short. And now you'll see I've got hello.c. C, but no hello because I deleted that with rm a few minutes ago. I've got compare.c which I just created. And then I've also got a program called compare. And the asterisk there is just a visual indicator that this is executable. It's a program you can run. It's not just a simple old file. Even though I didn't type ls previously with hello, uh it would have similarly had an asterisk next to it in this context. But you don't see that in the file explorer. If I now do compare, well, let's do something silly like one for x, two for y. Okay, X is less than Y. Let's do it again. Dot slashcompare two for X, one for Y. Okay, and I see nothing. Well, why am I seeing nothing? Well, logically, I didn't have a condition for checking for greater than, let alone equal to. So, let's enhance this a little bit. Let me go ahead and minimally say, all right, else if X is not less than Y, let's go ahead and print out X is not less than Y back slashn close quote semicolon. So I'm at least handling that situation too. Let me clear my terminal window. Do make compare again. Dot /compare one and two works exactly the same. Now let me go ahead and do two and one. There we have better output. Of course it's not really complete yet because if I do dot slash compare again and do one and one, it'd be nice to be a little more specific than x is not less than y. It's not wrong but it's not very precise. So I can add in the to the mix what we did earlier and I can say okay well else if x is greater than y say x is greater than y else if x equals equals y go ahead and print out x is equal to y back slashn close quote but here too someone observed that this is sort of stupidly inefficient what line of code should I actually improve here to tighten this up yeah >> instead What else did you just get rid of? >> Yeah. So line 17. I think I can just get rid of that unnecessary question because logically that's going to be the case at this point. And now I can go ahead and recompile this with make compare dot / compare again. Enter one and one. And now we're back in business catching all three of those situations uh those uh scenarios there. Questions on any of these things here? Why have I deliberately not done this? Let me rewind just a moment and let me hide my terminal window just to keep the emphasis on the code here. Why not do this and keep my code arguably simpler? Like why not just ask three questions? Step nine, step 13, and step 17 here. Yeah. What don't you like? >> Because then it would check each and every condition. Um even though for example the first one might be fulfilled, it would check the second and third. That wasted Exactly. It's another example of bad design because now no matter what, you were asking three questions on lines 9, 13, and 17. Even if X ends up being less than Y from the get-go, you're still wasting everyone's time by saying, "Wait, well, is X greater than Y?" You already might know that it's not. Is X equal to Y? You already might know that it's not. And so these three conditionals at the moment are mutually exclusive, whereby you're checking all three of them no matter what. even though logically that shouldn't be necessary. So our first approach was actually quite better. And in fact, just to show you the the density difference here, let me go back to this very first version here whereby I was only checking that one condition. Is X less than Y? Well, if you're more of a visual learner, you can actually draw out what code looks like in flowchart form. So here is a drawing of a program that starts here and ideally stops down here. And each of these uh figures in the middle sort of represent logical components of the code. Uh here in the di in the diamond here is my boolean expression which represents the start of the conditional. So if x is less than y I have a decision to make yes or no true or false. Well if it is less than y true. Well let's go ahead and print out quote unquote x is less than y and then stop. However the first version of that program recall just said nothing if it were not the case that x were less than y. That's because false just led to the stop of the program. There's no keyword stop. There's just no hand no code to handle that situation. But the second version of the code when I actually added an else looked fundamentally a little different. So now second version of that code asked is X less than Y and if true behavior is exactly the same. But if it weren't true, it were instead false, that's when I got the message X is not less than Y. But in the third version of the code where I added the if else if else if then the picture gets a little more complicated and let me zoom in top to bottom here we have a longer flowchart but the questions are really the same. When I start this program I ask is s is x less than y. If so I print out x is less than y. However in that la sorry in that last version of the program I was still foolishly asking the same question. Well wait a minute. Is x greater than y? Wait a minute. is x equal to y and that's the version in which again I had all of that unnecessary code which I just undded here asking three questions at a time ideally I don't want to make that mistake by doing it again and again and again so if I instead revert that code to else if and else if then my flowchart looks a little bit different because notice the sort of shortcuts now if x is less than y true we do this and we're done Super quick. If X is not less than Y, fine. We do ask one more question. X is greater than Y. Well, if so, boom. We make our way to the end of the program by just printing that. Only if it's the perverse case where X equals equals Y. Do we check this condition? No. This condition, no. This condition, and then okay, now we can print out X is equal to Y because it must be logically. Of course, it's been observed multiple times. This is a waste of everyone's time. So we can prune this chart more and just have one question, two questions and that alone tightens up the program. So again, if you're more of a visual learner, most any block of code you can re translate to this sort of pictorial form, but it really just captures the same logical flow that the indentation and the syntax and the code itself is meant to imply. All right, how about a final exercise with one other type here? Recall that this is our available types to us. Actually, two final examples here before we have a bit of a break. Here we have a list of types that we can use. And here we have a list of functions that we can use. Let's go ahead and make a a program that's representative of something we do quite often nowadays, but using a different type. So, let me go back into VS Code. Let me close compare.c. Let me reopen my terminal window and clear it just so we have a new prompt. And let's go ahead and create a program called agree.c. It's all too often nowadays that we have to like agree to terms and conditions. To be fair, it's usually in the form of like a popup and a button that we click, but we can do this in code at the command line as well. Let me go ahead and include to start CS50.h and include to start standard io.h. Let me again for today's purposes do int main void, but we'll reveal next week what we why we keep doing that. And now for a yes no answer, it suffices just to ask for a single char or character, not a whole string. So let's do this. char C equals get char and let's ask the user quote unquote do you agree question mark for instance and now I can actually compare that value for equality with some known answers for instance I could say if c equals equals quote unquote y then go ahead and print out for instance agreed period back slashn close quote semicolon else if c equals equals equals n in quotes. Let's go ahead and print out, for instance, not agreed period back slashn semicolon. Now, there's still room for improvement here, but notice we're just now using the same building blocks in C um in different ways to solve different problems. But notice on lines 8 and 12, I've used single quotes, which I alluded to earlier. Why is that the case? Why single in this case here? >> Yeah, it's a single character. And this is just the way you do it in C. When you want to compare a single character, you use chars and you use single quotes. When you want to use strings of text, like multiple characters, multiple words, multiple sentences or paragraphs, you use strings. So this would seem to work, but arguably I could be a little more efficient. If the user doesn't type why, I mean, frankly, I could just chop off this else if and make it an else and just assume if you don't give me a Y answer, then at least I'm going to assume the worst and you don't agree. But even here, the program's not all that great. Let me go ahead and do make agree and then do dot slag agree. And do I agree? Sure. I'm going to go ahead and type y. Meanwhile, if I type anything else like n or uh even emphatically, no, that would seem to Whoops. Why did that not work? Yeah. >> Exactly. So, among the features of CS50's functions like getchar is that it will enforce what type of data you're getting. So even though I it because I used getchar, if the user doesn't cooperate and types in multiple characters, get char like some of our other functions is just designed to prompt them again again and again until they cooperate. That's useful so that you don't have to deal with that kind of error checking. But here I could type n in uppercase and that seems to now work. But that only works because of the else. Let me go ahead and do this which is very reasonable. I'm going to go ahead and type y capital y which you would hope works. That feels like a bug at this point. Like it's fine if we don't want to support yes and no. We just want to support Y and N. But it's kind of obnoxious not to support the uppercase version thereof. So how can we fix this? Well, let me hide my terminal window. And I could go in and fix this as follows. I can say well else if C equals equals quote unquote capital Y in single quotes. And then I could do print out agreed period back slashn semicolon. And then I can do uh else uh that that would work. That would work there. But what rubs you the wrong way perhaps about this solution? Even if you've never programmed before, just applying some of the lessons from last week. Yeah, >> it's redundant. I mean, I didn't technically copy and paste, but like line 14 is identical to line 10, so I might as well have copied and paste. And that's generally bad practice. Why? Well, if I want to change the English language to say something else in that case, now I have to change it twice. And it's just I'm repeating myself, which is just bad design. So, there are ways to address this through other types of operators that we haven't yet seen. If I want to ask two questions at once, that's fine. I can do something like this. Well, if C equals equals quote unquote Y or C equals equals quote unquote capital Y, I can tighten things up using so-called logical operators whereby I am now taking a boolean expression and composing it from two smaller boolean expressions. And I care about the answer to one of those questions being true. So whether it's lowercase Y or uppercase Y, this code now will work. And if it's anything else, we're going to default to not agreed. So the two vertical bars, which is probably not a character you type that often, and it varies where it is on your keyboard depending whether it's American English or something else, just means logical or. This is not relevant here, but you could also in some context use two amperands to conote and. But this does not make sense. Why? Why is it clearly not correct to say and in between these two clauses? Yeah, >> exactly. The variable can't both be lowercase and uppercase. That just makes most no sense. So, this would be a bug, but using a vertical two vertical bars here is in fact correct. All right. Well, let's do one final flourish here. Besides conditionals, we had these now loops. Recall that a loop is just something that does something again and again and again. Here for instance to scratch how we might meow three times in C. There's going to be a few different ways to do this. Here is one. You can in C declare a variable like I for integer or whatever you want to call it and set it equal to three, the number you care about. You can then use a loop and the closest to the repeat block is arguably a while loop. There is no repeat keyword in C. So we can't translate this verbatim, but we could say while I is greater than zero. Why? Because that's sort of logically what I want to do. If I start counting at three, maybe I can just sort of decrement one at a time and get down to zero, at which point I can stop doing this thing. So I'm going to initialize a variable to I, a variable I to three, and then I'm going to say while I is greater than zero, go ahead and do the following. And at the end of that loop before whipping around again, I'm going to use this line of code, which we haven't seen, but you can infer. IUS minus just means subtract one from I. So this is going to have the effect of starting at three, going to two, going to one, going to zero. And as soon as it goes to zero, this boolean expression will no longer be true. And so the loop will just implicitly stop because that's it. So what are we going to put inside of the curly braces besides this decrementation? Well, I think I can get away with just saying meow. And that will now print 1 2 3 times. And yet that's interesting. I sort of counted in instinctively 1 2 3 even though I'm proposing that we count 3 2 1. So can we implement the logic in the other direction whereby we count up from zero instead of down from three. Well sure we just have to make a few changes. We can set i equal to zero initially. We can change our boolean expression to check that i is less than three again and again. And on each iteration of this loop let's just keep incrementing i with i ++. And at this point it will have the effect of doing 1 2 3. Three is not less than three. So I won't put any more fingers up. I will meow in total three total times. And again, if you're a visual person, here's how we might start counting at zero initially. Check that i is less than three, which it is initially. And if so, we print out meow. Then we increment i, and we get whisked around again to the boolean expression because that's how while loops work. You constantly have the condition being checked again and again. That's just how C works. As soon as I've incremented I from 0 to 1 to two to three, three will eventually not equal not be less than three. So the answer will be false. So the loop will just stop. So that has the effect of achieving the same. But it turns out that looping uh some amount of times is so darn common that you don't strictly have to use a while loop. A for loop, so to speak, is another alternative there too, whereby the syntax is a little weird. It's a little harder to memorize, but it allows you to write slightly less code because you write more code on a single line. So the way you read a for loop is exactly the same in spirit. You initialize the variable everything to the left of this first semicolon. The you then check the condition and the computer does all this for you. If I less than three, if so, you execute what's inside of the curly braces and then automatically the thing to the right of the second semicolon happens. So I gets incremented from zero to one. In this case, the condition is checked. Is one less than three? It is. So, we print meow again. And C increments I to two. Is two less than three? Yes. So, we meow again. I gets incremented to three. Is three less than three? No. So, the for loop stops. So, it's exactly the same, but just more magic is happening in this first line of code here more than you yourselves have to actually write. And it's just arguably more common convention. But both of them are perfectly correct if you'd like to do that yourself. So let's go ahead and actually implement now this this beginning of a cat in VS Code. Let me go back to VS Code and close agree.c. Let me reopen my terminal window and create a actual cat in cat.c. And let's go ahead and do this initially the wrong way. Include standard io.h int main void. And then inside of main let's go ahead and print out quote unquote meow back slashn semicolon. And then heck, let me just copy paste. So this is obviously the wrong way, the bad way to do this because I'm literally copying and pasting. But it is correct. If I want the cat to meow three times, I can make this cat. I can do slashcat and I get my meow meow meow. But let's now actually use some of those new building blocks whereby we converted scratch to C. And let me go back into this code and I'll do the while loop first. So I could instead have done int i equals 3. If we count down initially while I is greater than zero, then go ahead and print out quote unquote meow back slashn. And then do I plus+ or I minus minus? I minus minus because we're starting at three. Now let me go back to my terminal window and clear it. Do make cat again. Dot /cat and we get three meows. And this is now arguably better implemented. What if I want to flip things around? Well, I could now change uh maybe do it the normal person way. I could start counting at zero. And I can do this so long as I is less than three. And I can do this so long as I increment I on each iteration. Now I can do make cat again. Dot /cat. Enter. And that too works. But there's another way I could do this. If I want to count like a normal person, like start counting from one and count up two and through three, I could do this. But this is arguably this is correct. It would iterate three times. But it's a little confusing because now I have to think about what it means to be less than four. Okay, that means equal to three. I could be a little more explicit and say we'll do this while I is less than or equal to three using yet another one of those operators. So I can make a cat yet again dot /cat and that too would work. Now which of these is correct or best? The convention truthfully is in general in code to start counting from zero. start counting up to but not through the value that you want. So at least you see the starting point and the ending point on the screen if you will at the same time. But of course I can condense all of this a bit more and turn this whole thing into a for loop. And I instead could do four int i equals 0 i less than 3 i ++ and then down here I could do print out quote unquote meow. And if only because I typed fewer keystrokes that time like this feels a little nicer. It's a little tighter and more uh efficient to create even though the effect is the same. Indeed, when I make this cat and do dot /cat a final time, this here too gives me the three meows. So, what could go wrong? Well, sometimes you might be inclined to do something forever and we might have done that in Scratch and indeed we did when we had some things bouncing back and forth off of walls and so forth. You can achieve the same thing in code. In fact, in C we could use a while loop, but there is no forever block. So while suffices, but recall that the while loop expects a boolean expression. And if I want to do something forever, I essentially need an expression here that's always true. So I could do something stupid and uh arbitrary like while two is greater than three or while one is less than two. I mean make a statement of fact that never changes air go. It's just going to run forever. But if the whole goal here is to do something forever and to get this boolean expression to be true, the convention in programming is just to literally say while true. And that implies and functionally means that you will do this thing forever unless you somehow prematurely break out of those curly braces. More on that before long. So if I want to meow forever, I could now just do this. And this would be an infinite deliberate loop. But unlike a game where you might want it to keep going and going and going for some time, I'm not sure this is going to be the best thing for us. Let's go ahead and try this. So let me go ahead here and include for good measure uh CS50's library if only because um it too is giving us features like uh bools. Uh here I'm going to go ahead and say while true and then inside of my curly braces I'm just going to print out meow. Let's go ahead back slashn semicolon. Let's go ahead here and make cat one final time. Let me go ahead here and do dot slashcat. And [sighs] this is like the annoying cat game. Just like meowing, meowing meowing endlessly. Like I've now kind of lost control over my terminal window. And mark my words, at some point you might do this, too. But let's go ahead and take a juicy 10-minute break here. Uh we have some delicious blueberry muffins out in the transep. Come back in 10 and we'll figure out how to stop this here cat. All right, so it's been about 10 minutes and like VS Code is freaking out with high code space, CPU utilization detected. Consider stopping some processes for the best experience. So this is what happens when you have intentionally or otherwise an infinite loop in so far as I've been printing out meow endlessly. And I was warned by my colleague that I probably shouldn't let this run too long because we might lose control over the environment altogether. But the answer to how to solve this is going to be control C. So there's a few cryptic keystrokes that you can use to generally interrupt things as in this way. And in fact, if I go back and you'll see, yeah, I kind of lost control over my code space here. I'm going to go ahead and try to reload the window altogether. But had I hit control C in time, let's hope this doesn't now go off the rails. C would have been our friend. There we go. And we're back. Okay. So, now that we've got control over our so-called code space again, how can we go about making our meowing program a little more dynamic in so far as let's like start asking the user how many times they want the cat to meow. Certainly, rather than do it an infinite number of times and even rather than do it three times alone, I think we have all of these building blocks thus far. So, let me go ahead and stay in cat.c here and go ahead and delete the body of the contents of my main function. And let's go ahead and do this. Let's give myself an int. And I'll go ahead and call it n for number. Though I could be more verbose than that if I wanted. I'm going to set it equal to the so-called return value of get int, which recall is going to get an integer from the user. And quote unquote, let's ask the user what's n just like I asked earlier, what's x and what's y, where n is the number of times I want the cat to meow. Now, how can I use this variable? Well, we have that building block, too. I could use a while loop or a for loop. And if I use a for loop, I could do this. I could initialize a variable i for integer, set it equal to zero initially. I could then do I less than not three this time but n. So I can use that variable as a placeholder inside of the loop to indicate that I want to do this n times instead of three. And on each iteration through this loop I can do i ++. Of course I could be counting down if I prefer uh by using decrementation. But logically I would say this is canonical. Start at zero and go up to but not through the value that you actually care about. And I'll go ahead now and print out quoteunquote meow with a back slashn semicolon. Back down to my terminal. Make this cat again. Dot slashcat. Enter. I'm prompted this time for n. I can still give it three and I'm going to get three meows this time. However, if I run it again with dot /cat and a different input like four, of course, I'm going to get four meows instead. Now, what is get in doing for me? Well, it does a few things similar to getch doing a few things for me. For instance, suppose that instead of answering this question correctly with a number n, I say something random like dog that is not an integer. And so the get in function is designed to reject the user's input implicitly and just reprompt again and again. Uh I can try bird and it's going to do this again. So somewhere in the implementation of get in, there's a loop that we wrote that does this kind of error checking for you. But it doesn't do everything because an integer is a fairly broad category of numbers. It's like negative infinity through positive infinity. And that's a lot of possibilities. But suppose I don't want some of those possibilities. Suppose that it makes no sense to ask the cat to meow like negative one time. And yet the program accepts that. It doesn't do anything or anything wrong. But I feel like a better designed program would say, "No, no, no. Negative one makes no sense. Let's meow zero or one or two or more times instead." So, how can I begin to add some of my own error checking and coers the user to give me the type of input I want? Well, let me clear my terminal window and go back up into my code. And why don't I do something like this? After getting n, let's just check if n is less than zero. Because if so, I want to prompt the user again. And I can prompt the user again by doing n equals get int quote unquote what's n question mark semicolon. Now what's going on here? Well on line six I'm doing two things. I'm getting an integer from the user and I'm not only storing it in the variable n. I'm also technically creating the variable n. So, I didn't call this out earlier, but on line six, when you specify the type of a variable and the name of the variable, you are creating the variable somewhere in the computer's memory. And that's necessary in C to specify the type. If the variable already exists though, and you just want to reuse it and change it later on, it suffices as in line 9 just to reference it by name. It would be sort of stupid to specify the type again because C already knows what type it is because you told C what it is on line six. So that's why lines six and nine are a little bit different. So let's see how this now works. Let me go back to my terminal window and remake this cat. Let me do dot /cat again. Let me not cooperate and type in like negative one again. And notice I am reprompted this time. Fine, fine, fine. Let's type in three. And now it works. But you can perhaps logically see where this is going. Let me go ahead and run this again. Dot /cat. Type in negative 1. Type in negative one. And huh, it didn't prompt me again. But that's consistent with the code. If I hide my terminal window here, you'll notice that I've got one maybe two tries to get this question right. And after that, there's no more prompting of me. Now, you can kind of imagine that this is probably not the best way to do this. If I were to go inside of line nine and then move the cursor down and say, "Okay, well, if n still doesn't uh is still is less than zero." Well, let's just do get int again and ask what's n question mark. And heck, okay, if it's still less than zero, well, let's just keep asking the same, right? Why is this bad? I'm repeating myself. I'm essentially copying and pasting even though I'm retyping. I mean, this just never ends, right? Like, how many chances are you going to give the user? In spirit, you'd hope that they don't un uh not cooperate this many times. But really to do this the right way, we should probably prompt them potentially as many times as it takes to get the correct input. So this is not the right path for us to be going down. But of course, we have already now this notion of like a loop whereby we could just do this in a loop. Ask the question once and maybe just repeat the question again, but the same question. So how might I do this? Well, let me go ahead and delete all of this. And let me just try to spell this out logically. So, I want to get a variable n from the user. And let's go ahead as follows. While true. I know how to do infinite loops now. And even though that created a problem for me with the cat, I bet we can sort of terminate the loop prematurely like I proposed earlier as follows. I could do this int n equals get int and ask the user again what's n question mark. And then I could do something like this. If n is less than zero, well then you know what? Go ahead and just continue on with the same loop. Else if it is not the case that n is less than zero, what do I want to do? I want to break out of this loop. So this is new syntax. This is something you can do in C whereby if n is less than zero, fine. Continue means go back to the start of the loop and do the same exact thing again. Otherwise, if you instead say break, it means break out of the loop and go to below whatever curly brace is associated with that loop. So, continue essentially brings you to the top. Break brings you to the bottom, if you will. So, logically, I think this is right, but this code curiously isn't quite going to work and get me a value for n. Let me go ahead and open my terminal window again. Let's make this cat. And, huh, cat. C line 19 character 25 is an error. Use of undeclared identifier N. Well, what does that mean? Again, cat. C line 19. Let me hide my terminal window. Highlight line 19. N is being used in line 19, but I created it in line 8. And so what's the problem? Why is it not declared seemingly? Yeah, >> because you are using like within the loop that you wrote. >> Yeah, this is a subtlety, but I'm using I'm creating N inside of this loop. I mean, literally between the curly braces on lines 7 and 17. The implication of which because of how C works is that that variable only exists inside of that for loop. This is a problem of what's known as scope. the variable n only exists inside of the scope of the while loop in which it was declared. So how do I actually fix this? Well, I need to logically somehow declare that variable n outside of the loop so that it exists later on in the program as well. And there's a few different ways I can fix this, but the best way is probably to move the the declaration of n, so to speak, the creation of n outside of the curly braces and maybe kind of squeeze it in here below line five. So still inside of main, whatever that is. More on that next week, but in the same curly braces as everything else. So I can in fact do this, and this is where the syntax gets a little bit different. I can solve this quite simply as follows. I can go down to a new line six and just say int n semicolon and that's it. This declares a variable called n. It creates a variable called n. And initially it doesn't give it any value. So who knows what's in there. More on that another time. But now on line 9, I don't need to recreate it. I just need to assign it a value. And because now n has been declared on line six and between the curly braces on line five and all the way down on 24. Now n is in scope so to speak for the entirety of this code that I've written. So let me reopen my terminal window and clear that old error. Let me do make cat again. Now the error messages is gone. Let me go ahead and do /cat. What's n? Now I'm back in business and I can do three for meow meow meow. Better yet, because I'm inside of a loop now, watch that I can do negative 1gative 1gative 1gative 1gative -2g350. Finally, I can cooperate with something like three. And because I'm in a loop that by design may very well go infinitely many times until the user actually cooperates and lets me break out of that exact loop. Now, I strictly speaking don't need both continue and break. I wanted to demonstrate that both exist, but this is like twice as much code than I actually need. If logically I just want to break out of this loop if and only if n is greater than or equal to zero because I'm sort of comfortable with the idea of zero meows but negative makes no sense. Well, I can just flip the logic. I can say if n is greater than or equal to zero then go ahead and break. And I've tightened up the code further. I could technically do something else. I could say something like if n is less than zero, but wait a minute. I want to negate that. You can start to do tricks like this. An exclamation point with some additional parentheses. So you can invert the logic. It's arguably a little hard to read. Even though that would be logically correct. So I'm just going to say more explicitly as before. If n is greater than or equal to zero, break out of this here loop. All right. So this is one way to use an infinite loop. But it turns out there's another construct that you can do altogether that is in a feature of C. Instead of using a while loop and forcing it to be infinite by using while true and then eventually manually breaking out of it, there exists another type of loop altogether and that's called a do while loop. And you can literally say the word do which means do the following. Then you can do exactly what we did before n equals get and quote unquote what's n question mark. So exactly like before but then after those curly braces you use a while keyword. So at the end of the loop instead of the beginning and that's where you put your boolean expression. I want to do all of that while n is less than zero. So you can kind of invert the logic and now kind of tighten things up further by just telling the computer do the following. What's the following? Everything in between those curly braces while n is less than zero. And this implicitly handles all of the continuation and all of the breaking by just doing what you've said. Do this while this is true. But the difference between this dowh loop and a normal while loop is literally that the condition is checked at the bottom instead of the top. So when you say while parenthesis something that question is asked first and then you proceed maybe this condition is only asked at the very end. And why is this useful? Well often time when writing programs where you want to do something at least once like you obviously want to ask the user this question at least once. There's no point in asking a question like while true or while anything else. You should just do it and then you should do it again if the expression evaluates to true and tells you to do something. Now you haven't played with these loops yet most likely unless you have programmed before. Uh there's a fun sort of meme that's apppropo of this moment. So let's see if this maybe causes a few chuckles. If you remember Looney Tunes here, is this funny for people in the know? There we go. Thank you. Okay, this doesn't make sense. It eventually will. And it still might not be funny, but it will at least make sense. And it illustrates the difference between doh while loop like the roadrunner is stopping because he's checking the condition. While not on edge, he'll run. But if he is on the edge, he's not going to proceed further. But of course, the coyote here, he's going to do running no matter what. And then only too late. Does he check? Haha. He's still on the ed. All right. So, ah, thank you. All right. Now, you're cool. All right. So, many more memes will now make sense as a result. But let's go ahead and revisit this code and maybe do something a little bit different here whereby we no longer want to just fuss around with some of these uh conditionals and these loops. Let's actually make the software a little better designed. And to do this, we'll revisit an idea that we touched on last week and had to do with problem set zero, which was like create your own function. Like C does not come with everything you might want. CS50 library is not going to come with everything you might want. And at the end of the day, a lot of programming is about abstracting away your ideas. So you solve a problem once and then reuse it, reuse it, reuse it. And heck, you can package it up in a so-called library like we have and let other people use it as well. So here for instance in Scratch is how we could have implemented the notion of meowing as by getting the cat to play the sound meow until done. We abstracted it away and then we had a magical new puzzle piece called meow in C. This is going to be a little weird today but next week these details will start to make more sense. You would instead do the following. Literally type void the name of the function you want to create and then void again in parenthesis. For now know that this is the return value of the function. So void means it returns nothing. This is the input to or the arguments to the function. Void means it takes no inputs. And that makes sense because literally meow doesn't return anything. It doesn't take anything. It just meows. It has a so-called side effect audibly last week. So this means hey c invent a function called meow that takes no input, produces no output, but does have a side effect of printing meow on the screen. Meanwhile, if I wanted to do something like this in code last week where I meowed three times, well, that's fine. We have the building blocks for this. And here's where inventing your own function starts to get more compelling. I can abstract away the notion of meowing now. Like, this doesn't come with C. It doesn't come with the CS50 library. I just created in the previous code this meow function. So, I can encode with a for loop and that new function meow three times. But I can abstract this away further. Recall that the refinement in Scratch last time was this. I could edit the new function and I can say it actually does take an input otherwise known as an argument called n. And I clarified that this means to meow some number of times. And then inside of those scratch blocks, I repeated n times the meowing act. Well, in C, I can achieve the exact same thing. Even though it's going to look a little more cryptic, but meow still returns nothing. It has a audible or visual side effect, but it doesn't return a value. But this version does take an input. And this might look a little weird, but just like before, when you create a variable in C, you specify the type and the name. When you invent your own function in C and it takes one or more inputs, aka arguments, you specify the type and the name of those as well. No semicolons up there, just inside of the parenthesis. And you'll get used to with practice this convention. But the rest of this code is exactly the same, except instead of three, I'm now using n. So again, I'm just composing the exact same ideas as last week, even though it looks way more cryptic this week, but it will come more and more familiar with more and more practice. So how can I go about implementing this myself? Well, let me propose that we do something like this. Let me go back to VS Code here and let me go ahead and let's really delete most of the code that I've written inside of Maine. And let me just suppose for the moment that meowing exists. And I'm going to go ahead and say for the first version for int i equals zero i less than three. So we're not going to take input yet. i ++. And then I'm going to go ahead here and say meow is what I want this function to do. Now if I scroll back up, you'll see there's no definition of meow yet. So I'm going to invent that too. I'm going to go up here and say void. Uh meow void. And again this first version means no input, no output, just a side effect. And that side effect super simply is going to be to say just quote unquote meow with a back slashn. And now if I go and open my terminal window, clear it from before, do make cat, so far so good. /cat, we're back in business, but I've abstracted the function away. Now, much like last week where I sort of dramatically dragged the meow definition way down to the bottom of the screen just to make the point that you don't need to see it anymore. Out of sight, out of mind. Let me sort of try to do the same here. Let me highlight and delete that and like go way way way down arbitrarily just to be dramatic and paste it near like the hundth line of code and scroll back up. Now out of sight, out of mind. I've already implemented the idea of meowing. We don't need to see or talk about it again. But there is a caveat in C. When I now clear my terminal and make this cat, now I've introduced a problem and there's like more problems it seems than code. Let me scroll back up to the first such error and you'll see this on line nine of cat.c See character 9, there's an error. Call to undeclared function meow and then something fairly arcane, but that means that meow is no longer recognized as an actual function. I know that it doesn't come from CS50.h, and I know it doesn't come from standard.io.h. It's just down there. But why is the compiler being kind of dumb here? Uh, yeah. function. >> Yeah, because in so far as the first version worked like logically it would seem that putting it at the bottom was just a bad idea because C compilers are fairly simplistic. Like they won't proactively do you the favor of like checking all the way down at the bottom of the file. They're going to take you literally. So if meow doesn't exist as of line 9, that's on you. Like that is an error. So I could fix this by just undoing what I did and move it way back up to the top. But let me argue that in general when writing C programs, the main function, which I keep using and we'll talk more about next week, is literally meant to be the main part of your code. And so it kind of stands to reason that it should be at the top because when you open the file, it'd be nice to see the main program that you care about, the main function. So there's an argument to be made that it's a little annoying to have to put my functions all at the top, which is just going to push main further and further down. So there is a solution, and this is dare say the only time copying and pasting is appropriate. Let me delete most of these blank lines which is unnecessarily dramatic and just move it below main as over here. The way I can uh the solution here though is to do this to copy the first line of the main function its so-called signature and then just put that one line and only that one line with a semicolon above main. And this is what's known as a prototype. So a prototype is just a bit of a hint to the compiler, a promise if you will, that hey compiler, there will exist a function called meow. It takes no input and it returns no output semicolon. And it's on the honor system that it will eventually exist later in the file. We'll talk more about this next week why that works, but this is sort of a promise to the compiler that it will eventually be defined. Now, what I've done here on line four as an aside is what's generally known as a comment. I just wanted to put on the screen exactly what I was verbalizing. Anything in C that starts with slash is a note to self, like a sticky note in Scratch, which is just for the human, not for the computer. And it's a way of reminding yourself or someone else what's going on on that line or those lines of code. But I'll go ahead and delete it for now is unnecessary because now if I go back into my terminal and clear those errors, make this cat again, now it does work because the cat uh the meow function has been defined exactly where it should be. And now I can make the new version of this uh cat even better. I could change the function meow to take a variable n as input for the number of times. And then in here I could do something like my for loop for int i equals z i less than n i ++. And then in this for loop I can print out quote unquote meow. And then I'm going to have to change this too because I have to copy and repaste it if you will or just manually fix that. But now I can get rid of all of this and do meow three for instance. And this now will be the second version of the scratch code. If you will make cat still going to work exactly the same. Meow meow meow. But now I've implemented my own function that does take input even though it doesn't happen to return any output. All right. Questions on any of these examples just yet? confusion. All right, let me add one other feature to this to demonstrate that we can take not only input but actually produce output if we want. If I go back into this code here, let me propose that it's a little silly to be hard coding that is fixating three. It'd be nice to get input from the user. So I could do this. I could use int n equals get int and say something like what's n question mark and then I could pass n in if only to demonstrate a couple of things. So one now the program is dynamic. I'm going to ask the user how many times to meow and I'm going to pass in that value n. Now this deliberately is confusing at the moment because wait a minute I got n defined here used here but then redefined here and then reused here. So it turns out that even if you create n up here and use the name n, no other functions can see it for that same issue of scope. So for instance, suppose I didn't quite remember this and I sort of naively just said void. Meow doesn't need to take any inputs because heck meow uh n is already defined in main. Let me go ahead and open my terminal and clear it. Make cat and see what error comes out here. Well, error cat. Oh, sorry. I made two mistakes here. Let me I also have to change the prototype up here to say void which means again meow takes no inputs. Let me go ahead now and rerun make cat. And there we have an undeclared identifier again n. So in cat line 14 which is here it doesn't like that I'm using n. But wait a minute I created n here but for the same logic as earlier. That's fine. You created n on line 8. But where does n exist? In what scope? Yeah, only between the curly braces, which is lines seven and 10. So by the time you get down to 14, it's out of scope, so to speak. So it just doesn't work. So the solution is exactly what I did the first time. I can pass it into meow as input, and I have to tell C to expect that input. And I can use the same name, but arguably that's going to get confusing sometimes. But let me do this. Let me go back into my code. Let me undo this change such that now meow does take an input, but instead of just calling it n and using n everywhere for number, this is crazy. Let's just call this like times. So meow takes some number of times and then it uses that value. And now I'm passing in on line 9 n, but in the context of the meow function on lines 12 onward, that same variable n is now referred to as times because you're passing it in as input and giving it its own name. And that's totally your prerogative. It's just a matter of scope. I mean, I could have called it M or some other letter of the alphabet, but times is even more clear because that's the number of times I want the cat to meow. But again, the whole point here is just this matter of scope. All right. So, let's take a higher level look now at some of the things we've been thinking about and then we'll do a final deep dive or two on some of the corner some of the problems that we can solve with all of these building blocks and some of the problems that we're sort of ignoring for now. So, when it comes to writing good code, CS50 and really the world in general tends to focus on these kinds of axes. Correctness, design, and style. What does this mean? Correctness just means does the code work the way it's supposed to? In the context of a class, it should do exactly what the homework assignment aka problem set tells you to do. In the real world, it should do exactly what someone decided the software should do, the product manager, the CEO, or the like. Correctness just means it behaves as it should. That's different though from how well designed the code might be. And we've seen that a few times. I've had some simplistic examples in Scratch and C that were 100% correct. Like it did the right thing logically, but I was wasting the computer's time. I was wasting the human's time by asking more boolean expressions than I needed to and so forth. So design is more about like in the in the world of English like not only saying things that are correct but doing it well like in making a good cogent argument not just one that happens to be correct. Style meanwhile is the third axis on which we might evaluate the quality of someone's code and that's more of the aesthetics like is everything pretty printed that is nicely indented are variables well- named and not just called XYZ arbitrarily or something like that. So style matters really to other humans, not to the computer, but to other humans. And to illustrate these, you'll see that in problem set one onward, you'll be given a number of tools that you can use. So one of those tools is called check 50. And in each problem set problem in C and Python and other languages, you'll be showed how you can test your own code. And you can literally run a command that CS50 created called check 50. You'll then specify what's called a slug, which just means a unique identifier for that homework problem. and you'll get uh quick feedback on whether or not your code is correct. It doesn't mean it's well implemented or well-designed or pretty that is well stylized. But at least that's the first gauntlet in getting good code submitted. Design though is much more subjective. Design is something you get feedback on from a human for instance in section or a teaching assistant or in software. You can actually see at top VS code there's a couple of buttons that I haven't yet used but could. Design 50 is built on top of the CS50 duck whereby if you have a program open in a tab, you click design 50, you will get chatgpt like advice on how you can improve not the correctness of that code but the design of that code, the quality thereof, which is a bit more subjective and modeled after what a good teaching assistant might say. Style 50, meanwhile, is a third tool that will provide you with feedback on the style of your code and will show you on the left what your code looks like and on the right what your code really should look like in so far as it should be consistent with what we've taught in class and consistent with CS50's so-called style guide. And those of you who have some prior programming experience undoubtedly won't like some of CS50's stylistic choices. And that's going to be the case in the real world, too. But as I alluded to earlier, in typical companies, you would have an official style guide or tool to which everyone adheres so that everyone's code actually looks the same as everyone else's even though people have contributed different solutions to problems. So correctness, design, style is not only how we but really the world at large tends to evaluate the quality of code and we do it by way of these CS50 specific tools here. All right, how about one final flourish then to this here program? Back in VS Code, I've got a correct solution right now. Um, it's well styled, I'll stipulate, even though it could stand to have some more comments. So, for instance, I could do something like this, like meow uh some number of times, a comment to myself. Or up here I could say something like uh get uh a number from user just to remind myself and my TA or my colleague what it is this code is doing. But what more could I do in the way of design? Well, this function here get in will indeed get me an integer but not just positive or zero but negative. And I could go in and add a bunch of code like before like I could actually do instead of this line I could do something like int n semicolon do the following. All right. n equals get int and then I can say what's n question mark and then after that I can do something like while n is less than zero keep doing that so I can have a pretty verbose implementation of getting user input or I can implement another function of my own that only gets a positive [snorts] integer or non- negative integer from the user for instance I might do something like this uh I could uh declare at the bot uh maybe below my main function a function like this uh int uh how about get n and then inside of this I might say void because I'm not going to pass in any input then inside of this function is where I'm going to do int n do while uh n equals get int quote unquote what's n question mark and then down here I'm going to do while n is less than zero but rather than do something immediately with n because I'm no longer inside of my so-called main function. What I'm going to do, which is new, is return this value n. And notice that this notion of returning a value, which is the first time I've done this explicitly, is consistent with this little hint here on line 19, which implies that this get n function, which I'm inventing, is going to return not void, which means nothing, but an integer. And that's the whole purpose of this function in life. Now, if I scroll back down here, I can get rid of this whole block of code and just say get n from the user and then I can immediately call meow with that value. I need to do one other thing. I need to highlight this line of code here and I'm going to go ahead and add another prototype up top, which is the only time again for now that copy paste is encouraged and uh best to do. So, I've invented my own function getn. The whole point being now I have this sort of abstraction here of a function whose sole purpose in life is to get me not just an integer but one that is zero or positive and not negative. If I open my terminal window, clear the mess from before, make this cat dot slashcat. What's N3? I'm now back in business. And again, we've essentially translated from scratch last time into C this time. Exactly how we might modularize now the code. abstract away these lower level details and ultimately create my own function that as before takes not only arguments but in this case has not only side effects or doesn't have side effects but rather a return value this time. All right. So as you walked in we had a little walkthrough of Super Mario Brothers playing from yester year which was a sidescrolling game in which Mario would jump down and go up down left right and try to collect coins and make it to the end of the level. There's a lot of obstacles throughout this kind of game uh whereby the world might look a little something like this. Like there's a pit that Mario's got to jump over and then there's these coins hidden typically behind these question marks that he can jump up and hit his head with and actually acrew points. Now, we're not going to do anything graphical just yet. We're leaving graphics behind for now in the form of scratch. But with C, we can implement some of these ideas. For instance, if I were to write code to generate just this uh row of four question marks, I dare say there's a bunch of ways we can do this. In other words, let's see if we can't use all of today's building blocks to start implementing our own tiny version of Super Mario Brothers in a file, say, called Mario.c. So, let me open and clear my terminal window. Let me run code Mario.c. And let's just try to do something super simple like print four question marks in a row. Well, to do this, I need print f. So, I'm going to include standard io.h. I'm then going to do int main void. More on that next time. And inside of main, my default function that just automatically as before gets called for me. I'm going to print out the simplest possible implementation just print out four question marks like that. So no need per se for a loop just yet. But I think we can go down that rabbit hole too. Let me go down into my terminal window. Make this version of Mario dot / Mario. Enter. And voila, we have a very black and white version textual version of four question marks in the sky. Now I'm kind of cheating here by just hard- coding four question marks. What if I wanted not four but three or five or some number other number? Well, we could do that with a loop too. So let me change this code here and do something like this. Four int i equals say zero. I less than say four for now. I ++ then inside of this loop I can print out one question mark at a time. Semicolon. Now let me go back to the bottom. Make this version of Mario dot / Mario. Enter. And voila. It's not actually correct this time. So why am I getting a column instead of a row with this here change? Yeah. >> Yeah. So I've got I foolishly included the backslash n after each question mark. Okay. So that seems like an easy fix. Let me get rid of that. Let me now recompile Mario. Rerun Mario. And now so close. [clears throat] Now I've just done something stupid. All right. I need the back slashn. So, I think I do want this here. Or what do you propose instead? >> Yeah, I should really put the back slash in outside of the loop. So, once I'm done printing all of the question marks, then I get the backslash. And that's fine, even though we haven't seen this before. Back slashn is an escape sequence that you can certainly print by itself. So, I do quote unquote back slashn outside of the loop below those curly braces. Now, if I do make Mario dot slashmario, now I get the four uh question marks in a row as well as the new line at the very end. So, again, kind of a little baby exercise, but demonstrative of how you can just take a few different techniques, a few different building blocks we've used to compose a correct solution to what a moment ago was a brand new problem. Well, let's try another. So later on in Super Mario Brothers when you go into sort of the underground world, you see or rather it's still above ground, you see a column of uh bricks like this that he has to jump over. So those here, how might we make a column? Well, we kind of had that solution already. And in fact, if I go back to VS Code here and just change this version of Mario, I think we can design this thing to be pretty simply the same. I is less than three though. And I do want to put the back slashn at the end there. Make Mario dot / Mario. And albeit textual, I've got my column of three uh of let's see, I don't want question marks. Let's make this a little better. Maybe we'll use the hash symbol because that kind of sort of looks like a square. So, make Mario dot / Mario. Okay, now we're back in business. But let's make it more interesting by going into Mario's underground now. And here's the third and final Mario problem whereby we want to implement like this 3x3 grid of bricks circled here. So, this one's interesting because we've never done something in two dimensions. I did horizontal, I did vertical, but we haven't really composed those ideas into the same. So, let me now think a little harder this time about how I can print out row, row, row. And this is where if you have in your mind's eye any familiarity with like old school typewriters, it's kind of the same idea where you want to print a row of bricks, then go back to the beginning, a row of bricks, then go back to the beginning, and a row of bricks. And that's kind of what print f has always been doing for us. It's printing line by line by line of text. It's not jumping around. So, we can leverage that perhaps as follows. Let me go into my main function here. And if I want to print out something two-dimensional, let me kind of think about it as rows and columns. So, maybe I could do this for int i equals 0, i less than 3, i ++. Why? Well, I want to do something three times. Even if I have no idea where I'm going with this solution, I at least want to do something three times, like three rows of text. But how about this? On each row, what do I want to do? I want to print out three things. So I could steal this idea like int i= 0, i less than 3, i ++. And then inside of this loop, let me just print out one brick at a time. No new lines yet. One brick at a time. But there is a bit of a problem here. This is correct to nest loops in this way. Totally fine to have an outer loop. Totally fine to have an inner loop. But I probably don't want the inner loops variable competing with the outer loops variable by giving them the same name. And that's fine. It is pretty conventional in code when you want another integer and it's not I because you've used it already. Fine. You can use J. So using I and J and K is generally fine. If you're using L, M, N, O, like at that point, you're probably doing something wrong. There's no hard line, but at some point it gets ridiculous and you should be coming up with better variable names. But I and J, maybe K is fine. So now what's really happening? Let me suppose that this is my uh for each row. This is my for each column I want to print one brick. Now this isn't quite correct but let me go ahead and make this version of Mario dot / Mario and ah now there's what? One, two, three. There's nine bricks there. So I'm close, right? It's supposed to be 3x3. Nine total. What do I want to do though to get this just right? Yeah, over on the left. Yeah. What on what line number would you or afterward? Uh where would I put the new line? Because I think I don't want to put it here because I'm going to get myself into trouble as before. How about in back? >> After the what? >> After 13. Yeah. So, after I finish printing each uh brick in the column from left to right, I'm going to go ahead and print out I think a single new line here, nothing else. And now, if I open my terminal, run Mike Mario again, dot / Mario. Now, we've got it. And it's not a perfect square like this one is because like the hashtags are kind of more vertical than they are horizontal, but it's pretty darn close. The e the takeaway here being you can certainly nest these kinds of ideas and compose them. And honestly, INJ is maybe making this uh more confusing than necessary. I could just give these better names like row, row, row, and then maybe call for column or column. I can spell it out if that's clearer. Column column just to make clear to myself, to my TA, to my colleagues what exactly these variables represent. And indeed, like an old school typewriter, the outer loop is handling row by row by row. But each time you're on a row, you first want to do column, column, column, column, column, column. And that's what logically the nesting is achieving. And again, if I do make Mario dot/mario, all I've done is change variable names. It has no functional effect beyond that. Now, this is a little more subtle, but there is a bit of duplication in this program. There's a bit of magic, and this is subtle, but does anyone want to conjecture what still could be improved here? What is maybe rubbing you the wrong way? >> Yeah, I've hardcoded the three here and here. It's not a big deal. It's like an in-class exercise. Like, who really cares if I'm just manually typing three. But if I want to make this square bigger and bigger and bigger over time, I'm going to have to change it in two different places. And I've conjectured last time and today eventually that's going to come back and bite you. You're going to do something stupid or a colleague isn't going to realize you hard-coded three in multiple places. Like just bad design. So, how could we fix this? Well, we could just declare a variable like n, set it equal to three, and then use n in both places. And that's pretty darn good. That's better because now we're reusing the value. But we can do one better than this. It turns out in C and in many languages too, there's the notion of a constant whereby if you want to store something in a variable, but you want to signal to the compiler that this value should never change. And better still you want to prevent yourself a human let or not not to mention a colleague from accidentally changing this value you can declare it to be constant or const for short. So if I go back into VS code on line five now and say constint that means that n is an integer that has a constant value. So if I do something stupid later in my code and I try to set n equal to something else the compiler won't let me do that. It will protect me from myself. So, it's just a slightly better design as well. All right, questions on any of these here, Mario examples. The first of our sort of real world problems, albeit simplified textually. All right, let's focus lastly on things we can't really do well with computers. Uh, namely some of the limitations thereof. So, here is a cheat sheet of some of the operators we've seen thus far. We played with these with comparison and uh doing some uh addition or the like but here we have addition, subtraction, multiplication, division and the modulo operator which is essentially the remainder operator which you can do with a single command uh with a single operator like this. Let's use some of these to make our own calculator and see what this calculator can and can't do for us. So back here in VS Code, let me open my terminal. Let's go ahead and create a program called calculator C. And in this program, let's do something super simple initially that just like adds two numbers together. So let's include first uh cs50.h so we can use our get functions. Then let's go ahead and include standard io.h so we can use print f. Let's just copy paste our usual ma uh int main void. And inside of main let's do this. Declare a variable x. Set it equal to get int. And let's ask the user what's x question mark. Then let's declare another variable y. set it equal to get int and ask the user what's y question mark. Then let's do something super simple like give me a third variable. Heck, we'll call it z. Set it equal to x + y. And then lastly, let's just print out the sum of x + y. So this is a super simple calculator for addition of two numbers. Print f quote unquote. What's the answer going to be? Well, it's not percent s. This was quick earlier. What's the placeholder to use for an integer? percent I back slashn and what do I want to substitute for that placeholder just z in this case we haven't quite done this before but again it's just the composition of some of our earlier ideas I can go ahead and make this calculator enter dot slashcal enter what's x is one what's y is two and indeed I get three so not a bad calculator it seems to be working correctly but it's maybe not the best design like it's generally frowned upon to create a variable like Z if you're only going to use it a moment later in one place. Like why are you wasting my time creating a variable just to use it once and only once? Sometimes it's fine if it makes your code more readable or clearer. And in fact, it might if I called it sum. Like that's arguably a net positive because I'm making clear to the reader that it's the sum of two variables. But even then, I'm quibbling. I could just get rid of that third variable altogether. And heck, I could just do x plus y right here. That's totally fine and reasonable, especially since it's still a pretty short line of code. It's not hard for anyone to read. Feels like a reasonable call. But this hints at again my comment on design being subjective. There's no steadfast rules here. Some of the TAs might disagree with me, but like h this feels fine. It's readable, which is probably the most important thing ultimately. Let's make this calculator dot /cal enter 1 2 and we still get three. So the code now is still working. As an aside, if you're starting to wonder how I type so fast, sometimes I'm kind of cheating with autocomplete. So if I know I want to create a program called calculator and calculator.c exists, I can start typing c tab and you can hit tab to sort of autocomplete the rest of the file name if it happens to exist there. Better still, if I want to go back to previous commands I've typed, I can actually use my up and down errors to go through my history. So if I go up up, you'll see all of the recent commands I typed, and that saves me time, too. So just little keyboard shortcuts that speed things along. All right. All right. Well, let's do something like this. Not just addition, why don't we use some multiplication? So, how about we prompt the user not for two um numbers, but how about just one initially x and let's go ahead and multiply x by two. And I would do x asterisk 2, which is the multiplication operator in C. Let's make this version of the calculator dot/cal. And now, what's x? Let's do 1. So 1 * 2 is 2. Let's do this again. Let's type in 2. 2 * 2 is 4. Let's do this again. 3. 3 * 2 is 6. and so forth. That's fine. It seems to work. But maybe let's implement like a recent meme from the past year or two. How about this? Let's uh let's see if you recognize it as we go. So, I'm going to get rid of this code al together. And inside of my calculator, I'm going to do something like int dollars equals $1 by default. Then I'm going to deliberately induce an infinite loop just for demonstration sake. Then I'm going to do a character from the user and say something like this using getch char which gets a single character. Uh, how about I'll tell the user here's this many dollars percent I with a US uh dollar sign before it double it and give to next person question mark if you're familiar with that one and I'm going to prompt them for yes no answer but I'm going to plug in the current number of dollars so they know what they're wagering on then below this I'm going to say if the character the human typed in equals equals y for yes then I'm going to go ahead and do dollars times equals 2 which recall was our shorthand notation for doubling something. Uh, in this case, I could more pedantically say equals dollars* 2. But again, I can save some keystrokes and do dollar uh times equals 2 instead. There's no plus+ there's no star star trick asteris asterisk trick. You have to do it in this way uh minimally. However, if the user does not want to double it and give it to the next person, then let's do an else and just break out of this infinite loop altogether. But notice what I've deliberately done in get char similar to print f. I have included a placeholder. Why we implemented getchar and get in and get string just like print f in that you can pass in placeholders and plug in values. Why? Well again for the meme sake I want to be able to tell the user how much money I'm about to hand them when I ask them the question. Do you want to double it and give it to the next person? I want to see the number. And the dollar sign is just because we're talking about dollars. The percent i is because we're talking about integers. All right. If I didn't mess this up, let's make this version of a calculator or meme. So far so good. Dot/calculator. Enter. Here's $1, which was the initial value of my dollars variable on line six. Double it and give it to the next person. All right. Why? Here's $2. Double it and give it to the next person. Okay. Okay. Okay. Okay. Okay. I'm going to do it faster. It's getting pretty good. You can see the power of exponentiation. It's getting pretty high. Let's keep going. Keep going. Lot of doll. Too far. That does not happen in the memes. What happened here? What's going on? Yeah. What do you think? >> Exactly. Good intuition. Because the computer only has a finite number of bits allocated to each integer. I hypothesized earlier that it's usually 32 bits, maybe 64 bits, but it's finite, which means you can only count so high and it's roughly 4 billion or again an integer by default can be negative or positive. So it's roughly 2 billion and that's pretty close to what we were getting here. In fact, we overflowed the integer in memory. In fact, integer overflow is a term of art whereby you can overflow an integer by trying to store too big of a value in it. And the reason for this is again to make this clear, this is a piece of memory inside of a laptop or a desktop or some other device. And in these little black chips is a whole bunch of bits or really bytes that can store information electronically. But they allocate those bits in units of 8, maybe 16, maybe 32, maybe 64, but finitely many per value. And whether we're using 32 or 64, you can only count so high if you have a finite number of bits. And we've seen this problem even on a small scale with our flat light bulbs last week. If we have a three-digit number as represented by like three physical light bulbs or three tiny transistors in the computer, I can count from zero to one to two to three to four to five to 6 to 7. If I want to count to eight though, I need a fourth bit. But as the red suggests, if you don't have a fourth bit, for all intents and purposes, that number is just zero. Or as an aside, depending on how you're representing your number, sometimes a leading one indicates that the number itself is negative, which is why in VS Code, we actually saw both symptoms. First, we went negative because we wrapped around logically, much like that one resulted in our getting back effectively to zero, and then we did indeed end up on zero ultimately. So, how can we chip away at this? Well, a couple of solutions perhaps. Let me close my terminal window here, and instead of using an int, well, let's just kick the can down the road. Let's use a long which is 64 bit. So at least we can give away even more money in this scenario. I can't use percent I and need to use percent li now for a long integer. But I think at this point if I go back to VS Code's terminal window here. Oh, and I quit that program by hitting C quickly. Uh now I'm going to go ahead and do make calculator again dot /cal. And I'm just going to keep hitting Y. But because I'm using a long int now and thus 64 bits, if I do this long enough, it's going to get crazy high and much much higher than before. High enough that I'm not going to keep clicking Y enter because we're never going to hit the boundary. But eventually, especially if I did this in a loop automatically, it would certainly Oh. Oh, okay. I guess exponentiation works fast. Okay, so it did work. I didn't think I was going to hit it enough times, but the same problem happened again. We overflowed this long integer even using that many bits because I was talking so long I kept hitting y enough times to overflow even that long integer. So that too was a problem and this happens truly in the real world. So picture here is a Boeing 787 from a few years back, long before there were all the more recent problems with Boeing planes, whereby after 248 days of continuous power, which is kind of a thing in the aviation industry, like time is money and generally they want the planes in the air as much as possible, which means they want them powered on as much as possible, which means they don't like turn them off at night. They keep them going and flying. After 248 days, the New York Times reported a few years back that a model 787 airplane that has been powered continuously for 248 days can lose all alternating current electrical power due to the generator control unit simultaneously going into failsafe mode. This condition is caused by a software counter internal to the GCUs that will overflow after 248 days of continuous power. Boeing is in the process at the time of developing a GCU software upgrade that will remedy the unsafe condition. So literally what this means is that the power to these planes would just shut off if the planes were on for more than 248 days at a time. And this was a common thing for planes to be maximal power. Why was this actually happening or what was the solution? Well, the short-term fix because it took a while for Boeing to fix this was what? What would you do if the the symptom is that the plane shuts off mid-flight after 248 days? Yeah. >> Turn it off back on. literally turn it off and back on again, much like you've probably been taught with your phones and computers and any other electronic devices that somehow freak out on occasion. Reboot the plane. Now, why is that? Well, anytime you reboot a phone or a laptop or a plane, all of those variables get reset to their default values, which if it's the first line of code, like in some of my examples, gets set back to zero again. For instance, the first line of code is executed from top to bottom. So, this effectively solved the problem. But when they finally rolled out a fix, then you didn't have to do that anymore. But the or source of the problem is essentially that they were probably using 32-bit integers, but also negative values. So they had 31 bits at their disposal to count to positive numbers. And 248 days is roughly how many tenths of a second there are, which means once you count in tenths of a second for 248 days, you would overflow an integer and the power would shut off effectively because something ended up going to zero. So, there was a lot of sort of marketing speak or technical speak in that, but it boiled down to just a simple integer overflow. There's a historical bug in Pac-Man. If you've ever played this uh in any of its forms, whereby you can play up to level 255, but because there was a missing if condition that checked what level you were on, you could accidentally garble the screen if you were amazing at Pac-Man because they too would overflow an integer and just random characters would end up appearing on the screen. So, it's sort of like a badge of honor to actually hit level 256 in this way because of this bug. But there's yet other issues we can see here. And if you don't mind, we might go a couple minutes over, but let me just demonstrate what these examples can do for us here. If I were to revamp my calculator here as follows by clearing my terminal window after hitting C to kill that, let me go ahead and get rid of all of this meme code here. Scrolling down to the inside of main, and let's just do a couple of things like this. int x equals uh quote unquote uh what's x question mark. Then let's go ahead and do int equals get int quote unquote what's y question mark. Then let's go ahead and print out just x / y. So here's a percent i back slashn x / y. This would seem to be a calculator now for division which I can make as before. And actually sorry I don't want to do missing terminating. Oh, sorry. Missing a double quote. There was an unintended bug. So, if I make this your calculator, do do/calculator, type in 1, type in three, I get zero, which is weird. What if I do instead maybe two and three? It's zero instead of 66. What if I do three and three? Well, that curiously works. But if I do something like four and three, which would be 1.33, that two doesn't seem to work. So there's this other issue in computing when you have finite numbers of bits known as truncation whereby even when you're trying to do floatingoint math like with a decimal point if you are using an integer you're going to throw away everything after the decimal point unless you're explicitly using the right data type. And we saw an illusion to this earlier. If I actually go in now and change my values from integers to floats and change the percent i to a percent f and remake this calculator. Now I can do 1 / 3 and I actually get back that their response. But there's another issue latent here which happens to in the real world whereby I'm going to tweak this percent f to be a little arcane. It turns out you can tell C how many digits you want to show, how many significant digits you want, if you will, by just using a dot and then a number like 50 arbitrarily. And contrary to what you might have learned in grade school, this calculator would seem to think that dot /calc 1 divided by three is not 0.3333 infinitely many times. There's all this random stuff happening at the end. Long story short, this is because computers one only use finitely many bits even to represent floatingoint numbers. And if there's an infinite number of those, you can't possibly represent every possible floatingoint value. So we're essentially seeing an approximation of 1/3 precisely. But this too happens quite a bit in the wild. There's really no solution to this other than by throwing more bits at the problem using a a double instead of a float or at least somehow trying to detect this and catch this. That then is what we'd call floatingoint imprecision. But to tie this together and sort of induce a bit of fear and for the coming years these things happen all of the time. Back when I was finishing school, there was the so-called Y2K problem or year 2000 problem whereby for decades, computers had been using not four digits to represent years, but just two because it was convenient. It was more efficient because you use half as much memory to represent maybe the year 1999, just using two digits instead of four. Of course, when the uh year rolled around from 20 thou from 1999 to 2000, if you didn't have these numbers even in memory, you might confuse 2000 with 1900, which was the presumption if you're only storing two digits. So, we screwed that up. And thankfully, the world scrambled. And if you read up on Wikipedia and news articles from the time, everyone thought the world might very well end, but it didn't. So, you'd think we'd have learned our lesson. Unfortunately, another such problem is coming up in the year 2038 whereby historically since uh the 70s and prior, computers have generally used 32-bit integers to keep track of time, the date and the time by means of counting how many seconds have passed since January 1st, 1970. And all of the math is just relative to that date because that's when computers were really starting to come onto the scene, if you will. Unfortunately, there's only 4 billion values you can count to or two billion if you're doing negatives from uh January 1st, 1970. And so, um on the date January 19th, 2038, we will overflow a 32-bit counter. And suddenly, if this problem is not fixed by you or other people before the year 2038, our computers and phones and other devices may very well think it's December 13th, 1901. So, there are solutions to these problems. CS50 is all about empowering you with solutions to these problems. But if you'd like to scan this here code, um, this will add that date to your Google calendar or your Outlook calendar. Keep an eye on it. That though is week one for CS50. Problem set one will be in your hands soon. We'll see you next time. [applause] Heat. Heat. [music] >> [music] [music] [music] [music] >> Heat. [music] One fish. Two fish. Red fish. Blue fish. >> Congratulations. Today is your day. You're off to great places. You're off and away. >> It was a bright, cold day in April, and the clocks were striking 13. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of victory mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him. All right, this is CS50 and this is week two. And if we could after this dramatic reading, a round of applause for our volunteers. [applause] So we can now take for granted from week one that we now have a new way to express some of the ideas that we first explored in week zero like functions and conditionals and variables and the like. And now we're doing in C what we used to do in Scratch. Today what we're going to start to focus on is some real world problems so that we can take for granted that we have that expressiveness. We have some tools in our toolkit and actually start to solve some realworld problems if representative thereof. In particular, the real world problem that we're going to start today and this week with is that of reading levels. Odds are when growing up, you read at a certain level based on the age at which you were at. Maybe it was first grade level or fifth grade level or 10th grade level or the like. And that was a function of just how comfortable you were with the words in the book or words on the screen that you were reading. What you've just heard, thanks to our volunteers, are three different reading levels that each of these three volunteers reads at. And in fact, why don't we go ahead and hear them again and be a little more thoughtful this time as to assess at what reading level your classmate is reading. So, let's start with Leah if you'd like to introduce yourself first. Hi, I'm Leah. I'm a first year in Hworthy. And here is my little thing. One fish, two fish, red fish, blue fish. >> So, at what reading level would you say Leah reads based on her recitation thereof? Yeah, in the front. >> Kindergarten. >> Kindergarten. Okay. Okay. So, a fairly young age. And what makes you say kindergarten? >> He is speaking in very short phrases without much complexity. >> Okay. Very short phrases without much complexity. And indeed, according to one scientific measure that we'll explore in this week's problem set, indeed. We would say that Leah reads before grade 1, so kindergarten would indeed be apt. But welcome to the stage here. Let's move on now to Maria if you'd like to introduce yourself. >> Yeah. Hi, I'm Maria. I'm in Stoutton thinking of applied math. Um, congratulations. Today is your day. You're off to great places. You're off and away. >> Another familiar phrase, perhaps. At what reading level would you say Maria is? Well, yeah. Over here. >> Third grade. >> And what makes you say second or third grade? >> Okay. >> So, now we're starting to introduce uh complexities like rhyming and a bit more substance to the quote. And indeed, based on that reading, that same measure that I described earlier, which will involve a mathematical function that somehow analyzes what it is Maria just said. Indeed, we would conclude that she read at a third grade level or grade three. Finally, Omar, if you'd like to introduce yourself and read once more yours. >> Okay. Um, so, hi everyone. I'm Omar. Um, I'm a freshman at Earl, but thinking of doing Kamsai and this is my reading. Um, it was a bright cold day in April and the clocks were striking 13. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of victory mansions, though not quickly enough to prevent the swirl of gritty dust from entering along with him. >> All right, sort of escalated quickly. What reading level is Omar at, would you say? Someone else. What might you say or estimate? Yes, right here in the front. >> Eighth grade. >> Okay, eighth grade. And what made you say that? more comp, >> more complex sentences, more complex words. And indeed, according to that same measure, this full paragraph of text now, which indeed has even more grammar when you see it there on the screen, would be said to be at grade 10 because of that added complexity. So, with that said, we're going to need to be able to somehow sort of crunch these numbers to determine given a body of text at what reading level someone is. But in order to do that and apply any metrics to a body of text, we're going to need to represent that text in memory using something like strings from last week. But last week with strings, we could really just print them out or display them wholesale on the screen. But I think we're going to need to break down these various texts and others like it at a finer grain level. And indeed, among the goals for today is to explore exactly that. and also to take the proverbial hood off of the car to take a look underneath and how the computer is actually working, how these things like strings are actually functioning. So, if you could join me one last time in a round of applause for our volunteers. Thank you so much for helping out. Thank you guys. [applause] Thank you. Thank you to Maria as well. So among the goals for today beyond exploring a representative problem like this of reading levels is going to be another one which is even more important and more omnipresent than reading levels namely cryptography. The art of scrambling information or specifically encrypting it so you can send secure communications. Now you sort of take this for granted increasingly nowadays that when you send a text message or perhaps an email or check out online with a credit card that somehow or other your information is secure. And over the coming weeks, we're going to explore to what extent that is actually true and why or why. Now, now with cryptography, similarly too, if we want to be able to send messages securely, such that if I want to send a message to you, I don't want anyone else in the room to be able to figure out what it is I have said, even if they physically intercept that message, which is all too possible in a digital world. We're going to need to come up with metrics and mechanisms for actually scrambling information in a reversible way so that I can write my message somehow scramble it. You can receive that message even if after it's passed through many other hands and you can descramble or decrypt that same message. So for instance, here on the screen is a message, a fairly simplistic one that has somehow been encrypted. And we'll see by the end of today and by the end of this week that this encrypted message and there's a bit of a tell on the end there actually will be said to decrypt to this is CS50. But why is going to be the underlying question and what additional tools do we need on our toolkit in order to do that? Another word on tools. So, up until now, you've probably experienced some bugs, whether it was in Scratch or ever more so in C. In fact, don't feel too bad if like the very first program you wrote in C like didn't even work. You couldn't even make it or compile it until you went back and fixed some of the code that you had written. Well, it turns out that bugs, mistakes in programs are ever so commonplace. And even though we've already provided you with tools like the virtual rubber duck at CS50.ai, also embedded into VS Code at CS50.dev, dev of whom you can ask questions along the way. Among the goals today are to give you some lifelong tools at how you can actually debug software yourself when you don't have a duck nearby, when you don't have a TA nearby, let alone any humans at all. So with debugging, there's going to be a number of techniques that we can use all toward an end of like finding and removing bugs or mistakes from our software. And perhaps the person best known for having popularized this term of bugs is that of uh Dr. uh Grace Hopper pictured here who was a rear admiral in the Navy and was one of the original programmers of the so-called Harvard Mark1, a very early mainframe computer that if you wander across the Charles River over to the science and engineering complex here at Harvard, you can actually see part of this on display still in the lobby. It was succeeded by the Harvard Mark II. And on the Harvard Mark II, Dr. Hopper and her team were known for having put this note in their log book after having done some number crunching on the system there. And if we zoom in, they had found a problem with the computer this one day whereby there was literally a bug, a moth inside of the circuitry of the computer. And as was written here, first actual case of bug being found. And ever since then, do we say ever more so, the phrase bug and debugging when it comes to finding and eliminating problems in our code. So let's start with just that. In fact, let me go over to VS Code and let's deliberately make some mistakes together that might very well be reminiscent of some of the mistakes you've accidentally made thus far, but along the way give you all the more tools for solving those problems as opposed to sort of uh having to ask someone else, be it virtual or physical, for help and actually find these mistakes in your own code. Let me go ahead and consciously in VS Code create a program known to be buggy called buggy.c. And in this program, let's go ahead and do some fairly familiar code initially. I'm going to go ahead and start just like we did last week with int main void. More on that today before long. Uh inside of my curly braces, I'm going to say print f hello, world. Uh that's it. Now I'm going to go back to my terminal window here. I'm going to go ahead and do make buggy to make a program from that source code. But before I do, odds are even after just a week of this stuff, you can probably spot a few mistakes I've made, a few bugs. What do you see wrong already? Yeah, >> include standard. >> I didn't include standard io.h, that so-called header file, which is important because it tells the compiler that I plan to use functions therein like print f, which clearly I'm doing. So, let me go in and include standard io.h. What else seems to be wrong here? Yeah. I'm missing a semicolon at the end of line five here. So, I'm going to go ahead and add that in. And this is subtle and arguably not a bug, but maybe an aesthetic detail. What else have I done arguably wrong? Yeah. And back. >> Yeah, I forgot my backslash and the new line character just to move the cursor to the next line so that when I get a new prompt, it's on a fresh line of its own. Again, more of an aesthetic, but certainly a pretty reasonable thing to do. So, let me go ahead now and actually in my terminal window run make buggy. and it indeed compiled. But up until then, had I not fixed those mistakes, I would have triggered a whole bunch of bugs, a whole bunch of error messages as a result. In fact, let's rewind in time and undo the fixes I just made and go back to the original form here and try running again. Make buggy. Enter. And we'll see some scary looking messages up here. Let me scroll up to the top of the output here where we see buggy c, which means line three. That's where the problem is right now. error call to undeclared library function print f with type and then it starts to get a little more complicated but I do see clearly that it's calling my attention to print f. So hopefully at some point if not last week hopefully this week onward your instinct will be ah all right I'm an idiot I forgot the header file in which print f is actually declared it's not a huge deal it's going to come with practice so that's how I might know uh in more intuitively what in fact uh the solution here might be now here's another common mistake that I've just gone in and fixed but I did do something wrong and hopefully none of you actually did this because it's an annual FAQ. What did I just do accidentally wrong? So it's not studio.h, it's standard io.h. So do kind of ingrain that one for standard input output. The next though bug that I haven't yet fixed is that semicolon. So let me clear my screen and rerun make buggy. I should no longer see that first error message anymore. But I now do see another error message on line five. Expected semicolon after expression. All right, that one's pretty explicit. So I'm going to go ahead and fix this. But notice that up until now, my code wouldn't have been able to compile because of those two error messages. it stopped showing me uh by showing me these errors. But at this point, if I run make buggy enter, it did in fact compile. And yet it's arguably still buggy because when I run dot /buggy, I get my prompt on the wrong line. So this is a distinction now between a syntax error, something that or a programming error that outright stops my program from compiling. It's sort of a dealbreaker versus something that's maybe more of a logical error. I actually meant to move the cursor to the next line. And so there's different types of errors in the world as we're seeing here. Of course, if I rerun make buggy again/buggy. Now we're back in business hopefully with the intention of having this uh display exactly that. All right. Well, let's modify to look a little more like something else from last week. Recall that last week I started to get someone's name more dynamically. So I said something like name equals get string. And that was a function we introduced. And I might have said something like this. what's your name? question mark with a space just to move the cursor over. I know now I definitely need to end my thought with a semicolon. I could try and compile this make buggy now and I'm seeing a different error message altogether that you might not have seen yet. So on buggy.c line five error use of undeclared identifier name. What now is the mistake that I've made? Why does it not know? declare the type. >> Yeah, I forgot to declare the type of this variable, which for those of you with the prior programming experience is not something you have to do in some languages like Python for instance. But in languages like C, C++, Java, and others, you do in fact need to explicitly tell the compiler that you want to instantiate a variable, create a variable in the computer's memory by telling it its type. And it's not going to be an int because I don't want an integer, of course, in this case. I want text which we now know to be called string instead. All right, I think this fixes that bug. So, let me do make buggy again. And hopefully, huh, a fatal error this time. Again, indicating that my code did not recompile on line five. Still, I have an error, but this time it says use of undeclared identifier string. Did I mean standard in? So, this is a bit of a red herring. The compiler is trying to be helpful and saying did I mean standard in but I don't think I actually do that just is the most similar looking word in the compiler's own memory. What's the actual mistake that I've made here? Yeah, >> you didn't CS library. >> Yeah, I didn't include the CS50 header file because string recall is a feature of the CS50 library as is get string and get int and others. So the solution here is indeed to go up here and just to be nitpicky I tend to alphabetize my header files. It's not strictly required technically but stylistically I find it nice to be able to skim the header files alphabetically to see if something is there or not. I can include cs50.h in addition to standard io.h and it's in that file c50.h that not only is get string define declared so that the compiler knows that it exists it turns out so is the word string. So this is a bit of a white lie and this is something we do in the early weeks of the class. We dug up these old training wheels from a bicycle. The whole idea being to sort of keep you up and avoid you having to do all too much complexity early on. The point of these training wheels in the form of the CS50 library is to let us kind of ignore what a string really is for just another week or two after which we will then uh peel back that layer, take off those training wheels and reveal to you what is actually going on. So, for now, strings exist, but they exist because of the CS50 library. In a couple of weeks, they're still going to exist, but we're going to call them by a different name, as we'll eventually see. But everyone in the real world, uh, every software developer uses the phrase string. So, this is a concept that exists. It is not CS50 specific at all. It's just that in C, the word string doesn't typically exist unless you make it so, as we have. All right. So I think now if I clear my terminal window and rerun make buggy now it should in fact compile. And if I run dot /buggy enter I should be able to type in my name. And now voila hello. So this is now not a syntax error because I didn't screw up my code per se like it compiled. Everything is grammatically correct so to speak but logically intellectually this is not what I wanted right I wanted it presumably to say hello David. So, let's fix one final bug here. How do I fix this? On what line? How do I get it to say, "Yeah, hello, David." >> Yeah. On line seven, I need to do the string placeholder, the format code, so to speak, percent s. And then one more thing, someone else. What do I do after this? Yeah. And back. >> Yeah. A comma. and then add the variable name that contains the value I want to substitute in there which is indeed name though I could have called it anything I want. All right, so now make buggy enter seems to have compiled again dot /buggy. Now I type in my name once more and now we're back in business. So over the course of these few exercises, clearly I I meant to make most of all of these bugs, these mistakes, but they demonstrate not only syntax errors, which are just going to stop the compiler in its tracks. Like you won't even be able to compile your code until you fix those things, but even after that, there could be these latent bugs that seem to not be there until you actually provide input and see what's actually happening at so-called runtime when you're running the actual code. And so here's where it's no longer as easy as just reading the error message and figuring out what it means because there is no error message that appeared on the screen when it said hello, world. We had to use our own human intellect and realize, okay, that's clearly not what I wanted. Had you run CS50's own check 50 program on something like that, we could have told you that that's not correct by automatically assessing the correctness of it. But the compiler has no idea what you are trying to achieve logically. it only knows about the language C itself and the requisite syntax for actually uh writing and compiling code. So how could we go about solving logical problems in code? So I would propose that we start to consider this here list whereby when you want to find a logical problem in your code and better understand what's going on or really what's going wrong, print f is going to be your friend. Up until now we've used printf to literally print on the screen. Hello David, hello Kelly or anything else on the screen. But you can certainly use print f temporarily to just print stuff out inside of your program that you might want to better understand. And then once you understand it and once you've solved some problem fine then you can delete those temporary lines of code recompile and move on. So let's use print f as a debugging tool in that sense. Let me go back over to VS Code here and let me in this same program buggy.c see sort of delete everything and start over with a different sort of bug. I'm going to include standard io.h at the top. I'm going to do int main void after that. And then inside main, I'm going to do a simple for loop that just prints out like a a stack of three bricks like we saw in the world of Mario when Mario needed to we claimed sort of jump over a stack of bricks. We want to print out just three of those at the moment. So I'm going to go ahead and say for int i equals 0. i is less than or equal to three because I want three of these i ++. Then inside of this for loop, I'm going to go ahead and quite simply do print f hash symbol to represent the brick followed by a new line to move the cursor to the next line. Semicolon to complete the thought. Now, I've deliberately made a stupid mistake here, but in the context of a simple enough program that we can focus on the debugging technique on, not on the obscurity of the bug in question. Hopefully, you'll spot the bug in just a moment, if not already. When I do make buggy now and dot/buggy, I don't get three bricks. I of course get one 2 3 four total. So, there's a logical bug in this program. And odds are you can already spot what it is. But let me propose that this program is representative of a type of problem that you can solve a little more diagnostically by poking around and really asking the computer via printf to show you what's really going on. And I would propose that one of the most helpful techniques in a situation like this if you're trying to wrap your mind around why are there four bricks instead of three. Well, clearly this is related to the loop somehow. So let's look a little more thoughtfully at what the value of i is before we print out each of those bricks. And I might literally do something like this temporarily. Uh, print f quote unquote i is percent i back slashn close quote. And then I could just print right here and now the value of i just so that I can actually see it. Let me now go down into my terminal window make buggy again dot /buggy. And now and I'll full screen my terminal. I'll get some diagnostic information at the same time. So when I is one I get a brick. When I sorry when I is zero I get a brick. When I is one, I get another brick. When I is two, I get another brick. When I is three, I get a fourth brick. So now I can kind of see that, okay, my loop is working, but I'm going too far. I'm going too long. Now I can do this even more succinctly. For what it's worth, I don't need a whole new print def statement. I could just go into my existing print def, put my percent I there, and then maybe a space just to scooch things over and then print out I in that same line. If I now do makebuggy slashbuggy. Okay, now I'm seeing that I'm printing a hash a brick for each value of i from i equals 0 1 2 and also three. So the solution of course is that I shouldn't be starting at zero and iterating less than or equal to three. The solution is like ah I'm an idiot. I should have said less than three. Or if I prefer to count starting at one like a normal person, I could have set I equal to one and then go up two and through three. But as I claimed last week, the canonical way, the most common way to do this is start counting at zero and go up two, but not through the total value that you have in mind. But there's going to be another technique that's worth knowing here. Let me go ahead and sort of abstract this away by whipping up a slightly better variant of this as follows. Let me go ahead and delete this for loop. Let me assume for the moment that inside of main I'm going to ask the user now for the height of a pyramid. And I'm going to do something like this. int h equals get int. And let's prompt the user for the height value of this pyramid or this wall. And then let's go ahead and assume there exists a function called print column who takes as input a number h which is how many bricks you want to print. Now this function does not exist yet. Print column. Get in does exist but I don't have access to it. So let me not make the same mistake twice. What do I need to add at the top of this file? Yeah, >> CS50 header file. >> I need the CS50 header file because I'm using the get int function now, which again comes from our library, not C. So, let me go ahead and include CS50.h, but now print column. I can invent this function myself. So, let me go ahead and say void print column int height in parenthesis. More on that in just a moment. And then I'm going to recreate the loop from before for int i equals z. I is less than or equal to the height. So I'm going to deliberately for now make that same mistake as before. i ++ and then inside of this for loop I'm going to go ahead and print out a single hash and a new line to represent that there brick. So now main can use a function called print column. It's going to pass in the value of h and then this for loop in the print column function is going to take care of printing this thing for me. So, let me do this again. Make buggy. Enter. So far so good. Dot /buggy. Let's put in a height. I'm going to say manually height of three. And I should see three bricks. But of course, I'm still seeing four. Now, before we move on, let me hide my terminal and propose that this is just kind of stylistically bad to put anything other than your main function at the top. But recall that if I move my helper function, print column, and it's a helper function in so far as I made it to help me solve another problem. I can't recompile and run my code now. Why? The compiler won't let me. Yeah. >> Exactly. When the compiler gets to line seven of my code, it's going to abort compilation because it doesn't know what print column is. Why? Because I don't tell it what it is until line 10. And this was the only time I proposed that copy paste is reasonable is to highlight and copy the very first line of that function. Paste it above main with a semicolon. And that's a so-called function prototype. It specifies what the name of it is, what its inputs are if any, and what its output is if any. And more on these inputs and outputs later on. But now this is just a more complicated but more modularized version of this same program. Let me do make buggy. Still compiles dot /buggy. type in three and I still have that same bug. But the catch now is that my code has gotten more complicated. And the point of my having abstracted away this idea of printing a column into a new function is that there's just more code now to debug. I could certainly go in there and start adding print fs, but at some point print f is going to be a very primitive tool and you're going to waste more time adding print defs, recompiling your code, running your code, changing the print f, recompiling your code, running your code. It's going to get very tedious quickly when you have lots of lines of code on the screen. So, can I actually step through my code line by line? Maybe like your TA would in a section or a small class line by line walking through the code. You can because another tool that you have access to is that called debug 50. So, this is a CS50 command that will start an industry standard debugger. And a debugger is a piece of software that is used in the real world that literally lets you do that, debug your code by letting you slow down or even pause execution and walk through execution of your code line by line. The only reason we call it debug 50 is because in VS Code it's a little annoying to start the debugger. And so we automated the process of starting the debugger, but everything thereafter has nothing to do with CS50 and everything to do with realworld software engineering techniques. So how do we use this? Let me go back to VS Code here and let me propose that I want to step through this code line by line just like we might at a whiteboard in a smaller class to figure out why I'm getting four instead of three hashes. Well, in my terminal window, what I'm going to go ahead and do is this debug50 space/buggy. So debug 50 is the command. It needs to know what program I want to debug. So I'm specifying/buggy, which is the name of the program I just compiled. I'm going to get an error though the first time I run this. Uh, as will you if you make the same mistake. I'm about to see this message here. Looks like you haven't set any break points. Set at least one break point by clicking to the left of a line number and then rerun debug 50. So, what is this really telling me? Well, the debugger has no idea when and where I want to pause execution so as to start walking through my code line by line. It wants me to tell it where to break. That is where to pause by clicking on a line number. So, let me hide my terminal for just a moment. And you've probably never done this intentionally, but if you hover over the space to the left of your program's line numbers, you'll see a little red dot, a little stop sign of sorts. If you actually click on a line number, that red dot will stay there. And you can see the hover here saying click to add breakpoint. What I'm going to go ahead and do is say click to add a breakpoint at main. Maine is the entry point to my program. It's the default function that gets called. Let's break right away so I can step through this code line by line. All right, let me reopen my terminal window and clear it and then run debug 50 again with dot slashbuggy enter. And now a whole bunch of stuff is going to happen quickly on the screen. And then it's going to clean itself up because once the debugger is running and ready to go, it's going to allow me to start stepping through my code line by line. So what is going on? Well, notice nothing has happened in the terminal yet. Why? Because my code has been paused inside of main. in particular, it's been paused in the first real line of code. So the curly brace is uninteresting. The first line is just the function's name essentially. So line 8 is the first juicy line of code that could possibly do anything useful. It's been highlighted here in yellow. And that the fact that this cursor is here means that we have broken execution on this line, but we have not yet executed this line, which is why in the terminal, I don't see anything yet. I definitely don't see height followed by colon. Notice what else has happened here. All of a sudden in the lefth hand side of the screen where your file explorer typically is or where the CS50 duck typically is, we see mention of variables, you can actually see inside of the debugger what the value of any variable in the computer's memory happens to be. Now I don't quite understand this right now. We'll come back to this over time, but weirdly before line a 8 even executes, it seems that h has a default value of 32,764, which seems to have come from nowhere. As an aside, this is going to be what's called a garbage value. And this is actually why we have Oscar so omnipresently here. A garbage value tends to be a default value inside of a variable that's the result of that memory having been used previously for something else. Inside of your computer, you've got all of this memory, random access memory or RAM. More on that today. And it stands to reason that the my computer or whatever cloud server we're using has been running for some time. So the bits that H is going to use might already have some random switches on and off. Some random pattern of bits that happens to give me 32,764. But the moment this line of code executes, that value is going to get changed to what I actually want it to be, which is what the human is going to type in. Meanwhile, at the bottom here, you'll see a so-called call stack. More on this too in the weeks to come, but you'll see that we've paused on the function called main in the file called buggy.c. So, how do I do something useful? Well, at the very top of the debugger, you'll see a whole bunch of color-coded icons. One looks like a play button. And if I click that, it's just going to continue execution of my code as though I don't want to step through it anymore. So, I'm not going to click that just yet. The second arrow, which is a little curved arrow over a dot, is the so-called step over line, which will mean step over this line and execute it, but only one line at a time. Let's go ahead and do exactly that. So, I'm going to click the step over icon, the second one, which is the curved arrow with the dot under it. Click. Now, I see in my terminal window height being prompted. All right, let's go ahead and type in three, just like I did before, and hit enter. Now, notice what happens. Execution has paused on line 9 instead of 8. And you'll see that my variable, a so-called local variable, has the value of three as intended. All right. So far, this isn't all that enlightening other than demonstrative of the fact that I can pause execution of my program anytime I want. So, let's now click that step over button again so that we actually print this column. Click. And there we have it. Four hashes at the bottom of the screen. Now, execution has paused at the end of the function. This is just my opportunity to either stop or restart or continue. I'm just going to go ahead and click the play button and let it finish executing. Unfortunately, that wasn't really at all in enlightening except to confirm for me that I typed in three and three is what is in the computer's memory. Not that interesting though yet. So, let's do this. Let's leave the breakpoint on line six as before. Let's rerun the debugger by running debug 50 space/buggy. Let's let it do its startup thing, which looks a little messy at first, but now we've highlighted line 8 again. I'm going to go ahead and step over this line because I do want to get an int. I'm going to type in three again. enter. But this time, instead of stepping over line 9 and just letting print column happen, this is where the debugger gets powerful. Let me step into line 9 and walk through the print column function itself line by line. So, let me go ahead and click not this button, which is the curved arrow over the dot, but the next one, which is the step into button. Click. And now you'll see that execution has jumped inside of print column and paused on line 14. At which point I can see at top left what the default value of I is. And this is some crazy garbage value because whatever bits are being used to store I's value have some random garbage from some previous use of that memory. But as soon as line 14 executes once, I bet I is going to take on a value of zero. So let's do that. I'm going to go ahead and click step over because I don't need to step into this because there's no other functions there. Step over it and immediately at top left I is now zero. Now line 16 is highlighted. Let's step over this. Okay. And notice in the terminal window, what do you see? The first of our hashes. Let's step over. Step over. Second hash. And I is now one. Step over. Step over. Now we see a third hash. And I is now two. Step over. Step over. Okay, there's the symptom of the bug. Four hashes and yet I is three. But wait a minute, this is going to draw my attention now to line 14 before I continue onward. Wait a minute. Three is of course less than or equal to three, which is why I got that fourth hash on the screen. So at the end of the day, like you still need to exercise some of your own human intellect to figure out and understand what's going on. But the value of this here debugger is that you can pause and work through things at your own pace and poke around inside of your own code and better understand what's happening as opposed to compiling the program, running it, and just now having to infer from the symptoms alone what the source of the problem might be. So that was a lot. Let me go ahead here and just let it continue to the end because I know what the problem is. Now I need to change the less than or equal to sign to a simple less than instead. Questions though on debug 50 or any of these steps. Yeah, >> I have two questions. >> Sure. >> Could you go over what the break point thing is? And then my second question was around the garbage. The second time you ran it, it still gave that same garbage value even though you had assigned to H. >> Correct. So in order of your questions, what again are these break points? The break point or the little red stop sign here just tells the debugger where to pause execution. So frankly, I didn't have to break pause execution at main. If I really care about debugging print column, I could have clicked down here instead and then it would have just run main automatically and only paused once print column gets called. So a break point is where your code will break, the point at which it will break. As for the garbage values, I'm tell it's I'm oversimplifying exactly what's going on inside of the computer's memory. and it's not necessarily using exactly the same memory as before, but the operating system will govern exactly how the memory is laid out. Um, this is actually a significant problem, long story short, in a lot of today's systems because it's not that interesting to me to know that there was 32,000, whatever that number is, or the negative number. But suppose that that revealed the password of some another program or function that had some information there. It seems all too easy with the debugger, let alone C, to actually poke around the computer's memory. And we're going to come back to that in a couple of weeks. But for now, it's a garbage value in so far as you didn't put the value there. It somehow got there on its own for now. Other questions? >> When you have a four, does the i= to one at the end of the four or the next? Correct. So the question is about the order of operations for a for loop. So the first time you go through a for loop the initialization happens the stuff before the first semicolon and the condition is actually checked the boolean expression. Then everything inside of the curly braces is executed. Then the incrementation or update happens which in this case is I++ and then the condition is again checked the boolean expression. The code is executed. The update happens. The condition again the code is updated. And so it starts to loop like this. The debugger's graphics are fairly simplistic and it just highlights the whole line without making super clear what's happening. But that's just the definition of a for loop. Good question. Others about debug 50 or print def. All right. Yeah. >> Can you change the position of I++ and height? Short answer, no. The first thing is the initialization, the variable you want to create and initialize. The second thing is the actual condition, the so-called boolean expression. The third thing is always the update. So, it must come in this order. What you're not seeing is that you can actually have multiple boolean expressions, you can have multiple initializations, you can have multiple updates, but we're keeping it simple for now. And this is canonical. All right. So to make clear, assuming that either print f or debug 50 helped me figure out where the illogic was in my thoughts, I now know that the fix here is to just go and change the less than or equal to to a simple less than. And if I run the program again, of course, it's going to give me the three bricks that I always wanted instead. But there's other techniques we can use too. So besides print f and debug, you might wonder why we have a 7ft duck behind me here. All of these little rubber ducks on the floor. So rubber duck debugging per week zero is actually a thing. Uh this was popularized in a book some years ago and the idea is that when you are facing some bug, some mistake in your program or you're just confused on some concept. There is anecdotal evidence to suggest that just talking out the problem with an inanimate object like a rubber duck on your desk is enough often for that proverbial like light bulb to go off over your head because you hear in your own words what confusion you're having, what illogical thoughts you're having, and you don't even need another human or TA or AI in the room to answer the problem for you. So in fact on the way out today at the end of class we've got hundreds of ducks and enough for everyone to take home with you if you'd like to use that as another debugging technique whether in CS50 or something else. But of course now in the age of AI you also have the AI powered virtual duck at cs50.ai and also in VS Code at cs50.dev which really is a mechanism for asking questions that you don't think you can solve on your own. So, it might be reasonable to ask the duck, "What does this error message mean?" If you're having trouble wrapping your mind around it, but it's less reasonable to say copy paste your code into the duck and say, "What's wrong with my code?" You should really be meeting the AI halfway. After all, what's the point of actually doing this or any other class is to develop that muscle memory, develop those mental models, get some practical skills. So try hard to walk that line between asking the duck too much versus deploying some of these same tools yourself. Print fbug 50, even a physical rubber duck on your desk before you resort to sort of escalating it to human like or duck help. All right, so with those tools added to one's toolkit, let's actually consider and reveal what's been going on underneath the hood since last week. So this was the mental model we proposed for last week whereby when you write source code in a language like C. It's not something that the computer itself understands natively because computers we saw only understand zeros and ones aka machine code. So the compiler is the program that we use to convert your source code to the machines code from C to zeros in one in this case. More generally a compiler is just a program that translates one language to another. And in this case we're going from source code to machine code. So let's consider what's really happening. And indeed, this is among the goals of this week is to take a look at a lower level so that when you encounter more interesting, more challenging problems, you'll understand from so-called first principles what the computer is actually doing and supposed to do. So you can deductively figure things out for yourself and generally not view computers as like magic or I don't know how this works. you'll have a fairly bottom-up sense of how everything works by terms end inside of any computer, laptop, desktop, phone, or the like these days. So, here's the simplest of programs that we wrote last week, even though there's a lot of syntactic complexity as we've seen. The goal is to get it to machine code. These here, zeros and ones. So, how has that been happening when you just run make since last week? Well, these are the two commands that we've typically run after creating a file like hello. C. We then compile it with make hello and then we run it with dot /hello. So let's give ourselves this starting point real quick just so that we have an example in mind of exactly what it is we're compiling. So let me go back to VS Code here. Close out buggy.c and let's create a new file just like last week called hello.c inside of which is our old friend standard io.h h int main void and then inside of this we'll keep it simple just printing out hello world which again is my source code in C. How do I now actually compile that? Well, of course I can go down to my terminal window make hello/hello and we're off and running. So it was a bit of a white lie for me to let you think though that last week the compiler itself is called make. Make is a command that literally makes your program. It makes it by compiling it. But make is not technically the compiler. If we really want to get nitpicky, the compiler you've been using is actually called clang for C language. And this is a very popular compiler, freely available, open source so to speak. You can even look at the code other humans wrote to create the compiler online. And what make is really doing for us is essentially automating this command. So all this time I could have just run clang spacehello.c. But the default file name from Clang the compiler weirdly and for historical reasons is not going to be hello as you would hope. It's going to be a.out for assembler output. And we don't do this in the first uh in week one of the class because like this just makes things unnecessarily complex that we're adding some random name that you just have to know to type. However, we can do this now as follows. Let me go back to VS Code here. And let me clear my terminal and type ls. And we'll see everything we've created thus far. Buggy. C, which when I compiled it, I got buggy. And hello.c, which I just wrote. And when I compiled it, I got hello. Let's do this command now manually, though. Let's use clang on hello. C, and hit enter. That two seems to work. But if I now type ls, you'll see a third program specifically called a.out, which happens to be the same as hello. It just is using the default name instead of my custom name, hello. But if I do dot slash a.out indeed that too will work. But the reason we don't do that certainly in the first week of the course is that things get a little annoying or sort of escalate quickly thereafter. So let me go ahead and change this program as we've done a few times already. Let me include cs50.h so that we get access to like get string. Let me do string name equals get string quote unquote what's your name question mark close quote. And then down here, just like before, let me add my percent s and add in my name. So, I did that super quickly, but it's the same program we wrote a few minutes ago, and it's the same one we wrote last week. What happens now, though, is as follows. If I now try to do clang hello C enter, I actually get an error message. This one perhaps more cryptic than most. Somehow or other, I have this error. Linker command failed with exit code one because of an undefined reference to get string. Now, in the past when we've seen undefined or really undeclared mentions of get string, the problem was just with missing this line. This line is clearly here. But the catch is I'm getting this error message now because when I run clang of hello.c, I'm just assuming that clang knows where to find the CS50 version of get string. And that is not the case. Technically, if I want the compiler to compile this code for me, what I'm actually going to have to do is this. Let me go back to uh my terminal window here, and I'm going to say clang hello. C, but I'm then going to specify -Lcs50, which is cryptic at first glance, but this is telling the compiler to link in the CS50 library so that it knows what the zeros and ones are that belong to the get string function. Long story short, if I hit enter now, the error message has gone away. If I type ls, I've still got a.out, but it's a new version thereof. And if I do dot / a.out, now I see the new behavior where I can type in my name and see hello, David. Now, this is getting a little stupid that I keep using a.out. We can change that as well. In fact, these commands, as we're starting to see, support what are called command line arguments. And a lot of the programs we've run already take command line arguments. When we run code space hello.c, the so-called command line argument to code is hello. C. When I run make hello, the command line argument to make is hello. In other words, the command line arguments to a program are all of the words you're typing in your terminal after the name of the program itself, whether it's make or whether it's code or anything else. So, this is to say what I just ran clang of hello. C-LCS50, I was passing in two command line arguments. Hello. C, which is the code I want to compile, and LCS50, which means use the CS50 library, please. But I can add another to the mix. I can actually do something like this. whereby I do clang- o hello hello then I can do hello c and then -lc cs50 enter. Now that too seems to work. And if I type ls I've got all the same programs as before. So let's go ahead and get rid of those to make clear what's going on. I'm going to remove a.out. I'm going to remove hello. And just for good measure I'll remove buggy as well. So that all I have left in this folder is source code. So if I type ls there's my two files. Let's do this again. clang- o hello hello c-lcs50 enter. Now if I type ls I don't see a.out anymore because apparently according to the documentation for clang the actual compiler if you pass d- o as a command line argument followed by another word of your choice you can name the program anything you want without having to resort to mv or clicking on it and typing a new name in manually. So if I now do /hello, I see the exact same version where it's just asking me for my name and then printing it out. But long story short, the whole point of this exercise is that like running commands like this quickly gets very tedious. You have to remember like the order in which to do it, what the command line argument. I mean, this is just stupid waste of time typically, certainly in week one of the course to have to memorize these kinds of magical commands to get things working. But for now, know that when you run make, it's essentially automating all of that for you and making it as simple semantically as make hello or make buggy. But what's really happening is the make command because of the way we've configured cs50.dev for you is doing all of this behind the scenes. And it's not that magical. This just means change the file name to hello when you compile it. This just means compile this code. And this just means use the CS50 library. like that's all. But that message about linking something in there's there's something juicy going on there such that make is in fact helping us sort of solve a whole bunch of problems when we compile and in fact let me propose that if we take a step back and look at some of the actual code that we're compiling. Let's consider like what we actually mean by compiling. Yes, it's the case that to compile your code means to go from source code to machine code. But technically there's a few more steps involved. Technically when you compile your code that's sort of become the industry term of art that really is referring to four separate processes all of which are happening in succession automatically but each of which is doing a different thing. So just once let's walk through these these several steps. So what is this pre-processing step? So consider this program here which we wrote uh in brief last week. We've got include standard io.h which is there because we want to be able to use print f ultimately. We've then got a prototype for this meow function. And the meow function does this. All it does is print out quote unquote meow followed by a new line. Takes no input, returns no return values. The main function now has a for loop. Iterates three times each time calling the meow function. And we saw this already earlier today. This line of code here, the so-called prototype is necessary because we need to tell the compiler that meow exists before we actually use it here, especially if I don't get around to implementing it until later. So this copy paste of that first line of code, a so-called prototype solve that problem. This is what the header files are essentially doing for us. Before I use print f down here, the compiler needs to know what it is, what its inputs are, what its outputs are. Turns out the prototype for print f is going to be in standard io.h. And that's what that line of code has been doing for us all this time. In fact, let's take a simpler example that we keep using here whereby I'm including CS50.h and standard io.h. And I'm using the CS50 get string function to get someone's name and put it in a variable called name and then I'm printing out hello, such and such. What's going on now when I pre-process this file by running make, which in turn runs clang? Well, the compiler finds on the server's hard drive the file called cs50.h H goes inside and essentially copies and pastes its contents into my own code. Meanwhile, such that we get the prototype there for get string. And we haven't seen this yet, but it stands to reason that all this time using print f, we've been passing in a prompt like what's your name? And we've been getting back a string. What's inside the parenthesis, recall, is the input. What's before the function name is the output, the so-called return value. What about standard io.h? It's in that file that print f's prototype is. So essentially what the compiler does when pre-processing this file is it finds standardio.h somewhere on the server's hard drive, goes inside and copy and pastes those relevant lines of code into my code as well. It's to avoid me having to do all of that myself, find the file, copy paste it, or manually type out the prototype. These pre-processor directives just automate all of that TDM. So what this effectively has at the top of my code after the files been pre-processed is all of those hash symbols followed by include are changed to contain the actual contents of those header files. Now the compiler knows what get string is all about and what print f is all about. That then is the pre-processing step. What is compiling technically mean? Compiling means taking that pre-processed code, which again looks a little something like this, and convert it into something called assembly code. And we won't spend much time in this class on assembly code, but this is how programmers used to write code. Before there was C, before there was Python and Java and all of these other modern languages, programmers were writing code like this. Before this existed, they were programming zeros and ones into the earliest of mainframe computers using punch cards and other technologies. Like literally sheets of paper with holes in them. Not very fun. Very tedious. So the world invented this. Also not very fun, very tedious. So the world invented C. Not that much fun. So the world invented Python and so forth. We continue to sort of evolve as a species with code. But the compiler technically takes your pre-processed source code and converts it into something that looks like this. Cryptic, and that's to be expected. But there are some familiar phrases. There's mention of main. There's mention of getstring. There's mention of print f. And there's a bunch of other things. Move and push and exor and call and these other commands here. These are the assembly instructions. Those are the lowest level instructions that the CPU inside of a computer understands. CPU is the central processing unit. The thing by Intel or AMD or Apple or other companies. Those are the lowest level commands that the actual hardware inside of the computer understand. It's just nice to be able to write words like main and for and uh print f than it would be to run these much more arcane commands that you'd have to look up in a manual. So compiling just takes CC code and makes it a lower level type of code called assembly. When I said a.out means assembler output, that's why inside of that file is essentially the output of an assembler. All right, we're almost there. What does it mean to assemble a program? which is step three of the compilation process. That means converting assembly code to the actual zeros and ones we keep talking about. So if the file is called hello C, when that file is assembled, the assembly code becomes the zeros and ones for your code in hello. C. But your code is not everything that composes your final program. Your code from hello. has to be combined with code from CS50's library from the standard IO library that other humans wrote. I and the team wrote the CS50 code. Other humans in the world wrote the print f code in standard IO. So essentially the fourth and final step is to link all of those zeros and ones together. Somewhere on the server there is not just the header file CS50.h and standard io.h but your code hello.c, our code cs50. C and the code that contains print def's own implementation. Bit of a white lie. It's technically not called standard io. C, but the point remains ultimately the same. So these files have already been compiled for you in advance. This is your code. What the assembly process does is it combines all of that into zeros and ones and then all three chunks of zeros and ones are linked together. So if you think back to when I tried compiling the code without -Lcs50, there was some mention of linker linking just means the computer did not know how to link your code with CS50's code because we were missing LCS50 which tells the compiler to go find it somewhere on the hard drive. And the final step then of linking is to combine all of those zeros and ones into one bigger blob of zeros and ones. And that's what's inside your hello program that you can execute. So long story short, these four steps are what's been happening ever since the start of last week. Pre-processing, compiling, assembly, and linking. But thankfully, the world of programmers generally just treats all four of these steps as what we know now as compiling. It's just a lot easier to say compile and not worry about those lower level details. But that might reveal better to you what all of these error messages mean when you see hints of this kind of terminology questions on any and all of that from here on out. We're going to go higher level than lower. Yeah. I I I don't get the part with the like when we're talking about com um when I think it's the assembly process when you basically convert it to zeros and ones. >> Um doesn't like across the multiple like the three different ones. Don't the zeros and one signify different things like one signify text and the other signify something else. How does the computer know like what part what 8 bit corresponds to which part? >> Really good question. How does the computer know which of those zeros and ones corresponds to data like numbers or strings of text or actual commands? We're going to come back to that in week four of the class. But long story short, what we just saw on the screen is a big blob of zeros and ones actually follow some pattern where the bits up top represent a certain functionality. The bits on the bottom represent something else and they're organized into patterns. So, long story short, we'll come back to that, but they follow conventions. It's not just a hot mess of like zeros and ones. >> Other questions? >> So, Preprocessing step is just replacing the hashtag. >> Correct. The pre-processing step goes into the header file and essentially copies and paste the contents of it into your own code so you don't have to waste time doing that manually yourself. Other questions? >> Just curiosity when you're talking about the compiling step um how it converts it to assembly code and you're saying that the CPU understands all those commands. Is the CPU then converting that into Uh no the so when you compile your code you're going from the uh assembly code to the zeros and ones that sorry uh when you compile let me pull up the the chart again when you compile your code you're going from the C code to the assembly code and the patterns you get when you see the assembly code are specific to a certain CPU. So long story short, if you're designing software for iPhones or for Android devices or Macs or PCs, you're going to necessarily use a different compiler because given the same C code, you will get different assembly instructions in the output. And this is why you can't just take back in the day like a CD containing a program from a Mac and run it on a PC or vice versa because it's the wrong patterns of instructions. But the reason why we have all of these annoying layers of complexity is because one, four different people can now implement the notion of compiling. Someone can implement the pre-processor, someone can implement the compiler, the assembler, the linker, and you can actually collaborate by breaking things down into these quantized steps. But also you can do this step, this step, and then two different people can write compilers to actually write uh to output assembly code for like iPhones over here and Android devices over here. But all of us can still enjoy using the same language up here. So there's a lot of reasons for this complexity. Just understanding it is useful, but you're not going to need to use this sort of knowledge day today, but it's what enables so much of today's complexity nonetheless. All right, so a bit of a flourish now as to what we've been doing with compiling. Well, compiling is going ultimately from source code to machine code. Couldn't you just kind of reverse the process, right? If someone wrote really interesting software like Microsoft Word or Excel or something like that, well, when I buy it or download it, like I literally have a copy of all of those zeros and ones, couldn't I just kind of reverse this process and reverse engineer someone else's code by decompiling it? And this is genuinely a threat. And this comes up in matters of law and intellectual property because the zeros and ones have to be accessible to you and to your computer. So, it's not a great feeling if someone with enough time and enough savvy could sort of reinvent Microsoft Word by just figuring out what all those zeros and ones mean. However, it's sort of easier said than done to reverse engineer code from these zeros and ones. For instance, this pattern of bits on the screen here did what did we say last week? Silly. No normal person should be able to answer this, but I did say it before. These zeros and ones print what? >> It just prints out hello world. And I cannot glance at that and figure it out like off the top of my head. But if I know what architecture, what CPU this code has been compiled into and I pay attention in week four and know what the various layout of the zeros and ones are, I could painstakingly figure out what each of those patterns of zeros and one means by breaking them into chunks of 8 or 16 or 32 or 64, which are common units of measure that I alluded to last week. Now, that's going to take a crazy amount of time. And the sort of pre presumption is that if you are smart enough and capable enough and have enough free time to do that, it would probably take you less time to just implement Microsoft Word the normal way and just rebuild the software. It's going to take you more time to go in reverse than it would in the so-called forward direction. But there's other subtleties as well. Inside of this code is not only commands like print, functions like printf, but suppose that it contained a loop for instance to print meow meow meow. Well, we know already that you can use a for loop sometimes or you can use a while loop, but they're functionally equivalent. It's sort of a stylistic decision which one you use, whichever one you're more comfortable with, or maybe feels a little better designed, but you can't figure out from the zeros and ones whether or not it was a while loop or a for loop, because it just results in the same pattern of zeros and ones. It's just a programmer's choice. Which is to say, you can't even perfectly reverse engineer everything because it's not going to be obvious from the zeros and ones what the source code originally looked like. But again the bigger deal breaker is if you have that much time and energy and savvy just like reimplement Microsoft Word itself don't try to reverse the whole process which is going to be much more painstaking and timeconuming instead. Now this is not true for all languages and just as a teaser in a few weeks time when we talk about web programming and another language called JavaScript it turns out that JavaScript source code is actually sent from web servers to web browsers and you can look at the source code of any website on the internet harvard.edu edu, facebook.com, gmail.com, it's going to be there. So, not all languages, it turns out, are even compiled. Typically, sometimes the source code is just executed by the underlying computer. So, we're just scratching the surface of some of the implications of all this. In a little bit time, let's take a look further under the hood at the actual memory, solve some other problems, but I think it's now time for cheese it. So, let's go ahead and take a 10-minute break. Uh, snacks are now served. See you in 10. All right, we are back. And up until now when we've been writing code, recall that we have to specify like what type of value you want to put in a variable. Like that's why I had to go in and add string before the word name in my first bug today. But it turns out C, as we've kind of seen already, has a whole bunch of these data types. Um, I rattled these off last week. Bool, int, long, float, double, char, string. But we'll consider for a moment just how much space each of these things takes up and see if we can't help you see what the debugger was seeing earlier. That is what is where in memory. So, a bull, it turns out, actually takes up one bite, which is kind of stupid because technically a bool, true or false, really only needs one bit. It just turns out that it's more efficient and easier to just use a whole bite, eight bits, even though seven of them are effectively unused. So, a bool will take up one bite, even though it's just true and false. An int recall uses four bytes. So, if you want to count really high with an int, the highest you can go is roughly 4 billion, we've claimed, unless you want to represent negative numbers, in which case the highest is like 2 billion. because if you want to be able to count all the way down to negative two billion, you got to kind of split the difference. A long meanwhile is twice that. It uses eight bytes which is roughly nine quadrillion possibilities which is quite a few more than 4 billion. Um that is if you want to include negative numbers as well. Then we had floats which were real numbers with decimal points which speak to just how precise you can be with significant digits. A float is four bytes by default, but a double gives you twice as many bits to play with, which gets you get lets you be more precise. Even though at the end of the day, whether you're using floats or doubles, floating point imprecision, as we've seen, is a fundamental problem for scientific, financial, and other types of computing where precision is ever so important. A char meanwhile, at least as we've seen it, is a single bite using asy characters specifically. And then string I'll put as a question mark because a string totally depends on its length. If you're storing high, that's like one, two bytes. If you're storing hello, that's like five bytes and so forth. So, strings depend on how many characters you actually want to store inside of them. So, where does this go? Well, here is a picture of a a stick of memory uh a a dim so to speak, whereby on this uh stick of memory, which is slid into your computer, your laptop, your desktop, or some other device, there's all these little black chips that essentially contain lots of room for zeros and ones. it's somehow electronic, but inside of there are all of the zeros and ones that we can uh store data in. So, if we kind of zoom in on this, it stands to reason that for the sake of discussion, if this one chip represents like one gigabyte, 1 billion bytes, it stands to reason that we could slap some addresses on these bytes whereby we could say this is the first bite and this is the last bite or more precisely this is by 0 1 2 3 dot dot dot bite 1 billion. And it doesn't matter if it's top, down, left, right, or uh any other order. We're just talking about this conceptually at the moment. So in fact, let's go ahead and draw this really as a grid of memory, a sort of canvas that we can just use to store types of data like bools and ints and chars and floats and everything else. If we are going to use one bite to store like a char, well, you might use just these eight bits up here, one bite up here. If you want to store an int, well that's four. You might use all four of these bytes necessarily contiguous. You can't just choose random bits all over the place. When you have a four byte value like an int, they're all going to be contiguous back to back to back in memory like this. But if you got a long or a double, you might use eight bytes instead. So truly, when you store a value in memory, whether it's a little number or a big number, all you're doing is using some of the zeros and ones physically in the computer's hardware somewhere and letting it permute them, turn them on and off to represent that value you're trying to store. All right, so let's go ahead and abstract away from the hardware though and let's just start to think of this grid of memory uh sort of in zoomed in form and consider more at a lower level what is actually being stored inside of here. For instance, suppose that we've got some code like this containing three scores on like problem sets. You got a 72 on one of them, a 73 on another, and a 33 on the third. I've deliberately chosen our old friends 72 73 33 which recall spell high or together in the context of colors is like a shade of yellow just so that we're not adding some new random numbers to the mix. These are our old friends three integers. Well, let's use these in a program. Let me go over to VS Code here and let me create with code a program called scores.c. That's just going to let me quickly calculate my average score on my problem sets. I'm going to go ahead and include as we often do standard io.h at the top. I'm going to do int main void after that. And then inside of my curly braces, I'm going to do exactly those sample lines of code. My first score uh was let's say a 72, my second score was 73, and my third score was 33. So I've declared three variables, one for each of my problem set scores. Now let's calculate the average. So print f quote unquote average colon just so I know what I'm printing. And now I'm going to go ahead and use maybe percent uh i back slashn. And then what I'm going to pass in is a bit of math. So to compute an average, it's just score 1 plus score 2 plus score 3 divided by three. And I put the scores the numerator in parenthesis just like in grade school like I need to do that operation first before doing the division. So just like math class semicolon at the end to finish my thought. Let's see how this goes. Make scores. enter dot slashcores and it would seem that my average across these three problem sets is 72 which I which is great but I don't think that's actually what I want here. What have I done wrong? It's unintentional. Yeah. >> Yeah. I'm kind of being a little generous with myself here. I didn't really factor in my worst score. So that was accidental. So now let me do this correctly. make scores dot slashscores and now okay my average is 59 but I I beg to differ I'd like to quibble my score technically I think mathematically should really be 59 and a3 I'm kind of being cheated those that third of a point so what's going on here why am I only seeing 59 and not my full grade >> you're using so it's going to >> perfect because I'm using integers when I divide by three it's going to truncate everything after the decimal point which we touched on at the very end of week one, which is an issue with just truncation in general. So, one approach to fix this, I could change my percent I to percent F, which is the format code, it turns out, for a float, and that is what I want to print. So, let's see if that fix alone is enough. Make scores. Oops, it's not. I got ahead of myself there. And let me scroll up to the error. Format specifies double, but the argument has type int. Turns out you can use percent f for doubles as well. So, that's why I'm saying double, even though I intended a float in this case. So, there's a problem here. I the argument has type int even though I'm passing in percent f. You're seeing mention of percent d here which is an alternative to percent i. We typically encourage you to use percent i because i for integer but there is uh that is not the solution to this problem because I want my third of a point back. So how could I go about fixing this? Well the fundamental problem here is that I'm trying to format an integer as a float or even as a double. Well I need to convert these scores to floats instead. So, I could go in and change this to float, this to float, this to float, and heck, just to be super precise, I could add a 0 on the end of each of them just to make super clear these are floats. But there's another way. I could, for instance, uh, simply convert my denominator to 3.0 because it turns out so long as you involve like one float in your math, the whole thing is going to get promoted, so to speak, to floating point values instead of integers. I don't have to convert all of them. So I think now if I do make scores dot slashscores now ah there's my third of a percent uh the third of a point back. There's another way to do this just as an aside and we'll see this again down the line if you really want to stick with three cuz it's a little weird just semantically to divide by 3.0 like that's an implementation detail but you're truly computing an average of three things. You can technically cast the three to a float in parenthesis. You can specify the data type that you want to convert another data type to. And this too should make the compiler happy. Aha. Dot /cores. I get roughly the same answer. We're seeing some floatingoint imprecision though nonetheless. But that too would achieve the goal here. But short that's all just a function of um floating point arithmetic there. So what's going on now actually in the computer's memory? Let me revert back to the simpler one with just 0 there. And let me propose that we consider where these three things are in memory. Well, if we treat this as my grid or canvas of memory, who knows where they're going to end up? But for the sake of discussion, let's assume that 72 ended up in the top left of my computer's memory. I've drawn it to scale, so to speak, and that this score one variable is clearly taking up four bytes of memory, and it's an int. And that's typically how many bytes are used on systems. Technically, it depends on the exact system you're using, but nowadays it's pretty reasonable to assume that an integer will be 32 bits on most modern systems. Score 2 is probably over there. Score 3 is probably over there. So, I'm using 12 bytes total, four bytes for each of these values. All right, so that's really all that's going on underneath the hood. I don't have to worry about this. The compiler essentially figured out for me where to put all of these things in memory. But what really is in memory? Well, technically each of these variables if it's used if it's composed of 32 bits is really just a pattern of literally 32 zeros and ones. And I figured out the pattern here. I crammed them all into the space there. But you see here three patterns of 32 bits which collectively compose those numbers there. But let's consider design now in terms of my code. This gets the job done. It's not that bad or big of a deal for just calculating the average of three scores. But this should also start to rub you the wrong way. this week onward when it comes to design like this is correct especially now that I uh clamorred back my third of a point but this is bad design using the variables in this way why might you think yeah >> you're going to have to type in each score manually assign variable individually >> yeah I'm going to have to type in each score manually with each passing week when I get the fourth problem set and the fifth I mean surely people who came before us came up with a better way to solve this problem than like manually create 10 variables, 20 variables, whatever it is by the end of the semester. It just feels a little sloppy. And indeed, that's often the the way to think about the quality of something that's designed. Think about the extreme. If you don't have three scores, but 30 or 300, is this really going to be the best way to do it? And if you feel like, no, no, there's got to be a better way, odds are there are. Certainly, if the language itself is well designed, so let's consider how else we might go about solving this. Well, it turns out we can treat our canvas of memory, that grid of bytes you into uh chunks of memory known as arrays. An array is a chunk of contiguous memory back to back to back whereby if you want to store three things, you ask the computer for a chunk of memory for three things. If you want 30, you ask for one chunk of size 30. If you want even more, you ask for a chunk of size 300. Chunk is not a term of art. I'm just using it to colloqually explain what an array actually is. It's a chunk or a block of memory that is back to back to back to back. So what does this mean in practice? Well, it means that we can introduce a little bit of new syntax in C. If I want to create one variable instead of three and certainly one variable instead of 30, I can use syntax like this. Hey compiler, give me a variable called scores plural. Give me room for three integers therein. So, it's a little bit of a weird syntax, but you specify the type of all of the values in the array. You specify the name of the array, scores in this case, and I pluralized it just semantically because it makes more sense than calling it score now. And then in square brackets, so to speak, you specify how many integers you want to put into that chunk of memory. So, this one line of code now will essentially give me 12 bytes automatically, but they'll all be referable by the name scores plural. So, let's go ahead and weave this into some code as follows. Let me go back to VS Code here, clear my terminal, and now let's just whip up the same kind of program, but get rid of these three independent variables. And instead, let's go ahead and just say int scores plural bracket three. Now, I need a way to initialize the three values. But this I can do too. It turns out that if I want to put three values in this, I just need slightly new syntax. I can say scores bracket 0 equals 2 72 scores bracket 1 equals 73 scores bracket 2 equals 33 so it's not all that different from having three variables but now I technically have one variable and I am indexing into it at different locations location 0 1 and two and it's zero because we always in computing start counting from zero so I do scores bracket zero is going to be my 72 problem set scores bracket one is my 73 problem set and scores bracket two was my weakest my uh 33 P sets. Now my syntax down here has to change because there are no more score one, score two, score three variables, but there are scores bracket zero plus scores bracket one plus. And notice what VS Code is trying to do for me. It's saving me some keystrokes. As I type in scores and type one single bracket, notice it finishes my thought for me and magically puts the cursor where I want it so I can put the two right there and generally save on keystrokes. But that has nothing to do with C. just has to do with VS Code trying to be now helpful. So I think now if I go down here and do make scores dot slashcores, we get the same answer, but it's arguably better designed because I now have one variable instead of three, let alone many more. And in fact, if I wanted to change the total number of scores, I can just change what's in that initial square bracket. So if we consider what's going on now, if we look at the computer's memory, it's the same exact layout, but there's no more three variable names. There's one scores bracket zero, scores bracket one, and scores bracket two. And notice here, ever more important, an array's values are indeed contiguous back to back to back. Now, the screen is only so wide. So, they kind of wrap around to the next row of bytes, but the computer has no notion of up, down, left, right. I mean, it's just a piece of hardware that's got lots of available that can be addressed from the first bite all the way down to the last bite. The wrapping is just a visual artifact on this here screen. All right. So if I've done this now, maybe we can make this program a little more dynamic than just hard- coding in my scores. Let me go in and add the CS50 header library so that we could also use for instance like get int and start getting these scores dynamically. So I could do get int and I could prompt the user for a score. I could use get int again and I can prompt the user for another pet set score. I can use get int a third time and prompt the user for a third such score. And then pretty much the rest of my code can stay the same. Let's do make scores again. Dot slashcores 72 73 33. And now my program's a little more interactive. Like this doesn't work for just my three scores. It could work for anyone scores in the class. Now this too hints of bad design. I like my introduction of the array because I now have one variable instead of three. But what now might rub you the wrong way among lines n 7, 8, and nine? Yeahive. >> It's repetitive. I mean, I typed it manually, but I might as well have just copied and pasted like literally the same thing. So, what's a candidate for fixing this? Like, what programming construct might clean this up? Yeah, >> yeah, we could use a for loop or a while loop or whatever, but a for loop would get the job done. And that's often my go-to. So, let's do that instead. Let's go under my declaration of the array and do four int i= 0, i less than 3, i ++, which we keep seeing again and again. Uh, now how do I index into the array at the right location? Well, here's where the square brackets are kind of powerful. I can just say my scores array at the location I should get an int from the user as follows. So now I'm using get int once inside of a loop, but because I keeps getting incremented as we've done many a time now for meowing and other goals, I'm putting the first one at location zero. Why? Because I is initialized to zero. I'm putting the second one at location one. Why? Because I'm going to plus+ or increment I on the next iteration, then the next iteration. So, this has the ultimate effect of putting these three scores at location zero, one, and two instead of me having to type all of that out manually. Now, I don't love how I've done this still. If we really want to nitpick, this solves the problem correctly, but it's kind of got a poor design decision still. It's got a a magic number as people say. What is the magic number here and why is it bad? Yeah, over here. >> Yeah, it was a little soft, but I think the number three is hardcoded in two places. We've got it on line six, which is the size of the array, and then again on line seven, which is how many times I want to iterate. But those are the exact same concepts, but it's on the honor system that I type the number three correctly both times. So, I think we can fix this a little better. I could do something like int n equals 3 and then I could use n here and then I could use n here so that now I only change it in one place. If your eyes are wandering to the bottom of the program, there's still a problem here because I've still hardcoded 0, one, and two, but we'll come back to that. But this is arguably a little better. But let's talk a little bit about style. Typically when you have a con when uh typically when you've got a a variable that should not change its value we saw last week that we should declare it as constant and the trick there is to literally just write const for short in front of the type of the variable and now it should not be changeable by you by a colleague a collaborator or the like but typically too by convention stylistically to make visually clear to another programmer that this is a constant it's convention also to capitalize constants so to actually use like a capital N here in all places just to make clear visually that there's something interesting about this variable and indeed it is a constant that cannot be changed. All right, with that refinement, I don't think we've really improved the program fundamentally. I think we're going to need to do a bit more work to do this really well. So, I'm going to do this a little quickly, but mostly to make the point that we can make this indeed more dynamic. So, let me hide my terminal window there. Let me go ahead now and get the scores as I already am as follows here. And let me go ahead and uh assume for the sake of time that we have a function that exists already called average and I simply want to pass in to that average function the scores whose average I want to calculate. So average does not exist off the shelf like I can't just use an existing library for it. I'm going to have to implement this thing myself. But how? All right. Well, let's go ahead and do this. At the top of my file, I'm going to go ahead and compute or define a function called average uh that takes in what? An array of numbers. So, this syntax is going to be a bit new, but the way I do this is int say array bracket zero or array sounds a little too generic. Let's just call it numbers for instance here. So that says my average function is going to take as an argument an array of numbers. This average function though should return a value too. And it should return what type of value from what we've seen thus far? A number, a float specifically. It could be int. But then I'm going to get short changed my third of a point potentially. So I think I wanted to return a float. Or if you really want precision, you could return a double just to be really nitpicky. But that seems excessive here. All right. Well, now inside of my average function, how can I calculate the average? Well, this is just kind of like a math thing. So, I could declare a variable called sum and set it equal to zero. I could then have a for loop inside of this function for int i gets zero, i less than, huh? Uh, I'm going to come back to this the number of numbers in the array. And then I'm going to do i ++. And then on each iteration, I'm going to do sum equals whatever the current sum is plus whatever is in the numbers array at that location. So I'm going a little quickly, but again, I'm just applying the same lesson learned. Numbers is my array. Numbers bracket i means go to the i location in there. But if my loop starts at zero, that means go to location zero and then one and then two. And heck, if there's more scores in this array, it's just going to keep going on up from there because of the plus+. But I hesitated here for a couple of reasons. So I put a to-do here, which is not a thing. That's a note to self. How far do I iterate? Well, if you've pro come into CS50 with programming before, you can usually just ask an array, aka a vector, what its length is in Java and in Python and the like. You can't do that in C. So if I want to know what the length is of this array, I've got to have the function tell me. So I'm going to additionally propose that this average function can't just take the array. It's also going to have to take another argument, a second input, for instance, called length that tells me how long it is. And then down here, which is where we started the story, when I use this so-called average function, I'm going to have to tell the average function by passing in n how many numbers are in that array, just because this is annoying that you have to pass in not only the array, but also its size separately. That's the way it's done in C. More recent languages have improved upon this. So you can just figure out what the length of the array is as we'll see in a few weeks in Python. All right, back to the average function at hand. I think we're almost there. This is a little unnecessarily verbose. Recall that we can tighten this up by just doing plus equals whatever is in numbers bracket I. That's just tightening it up. It's syntactic sugar, so to speak. And then the last thing I'm going to do in my average function is what? Actually calculate the average. So what is the average? It's just the numerator. like the sum of all of the scores divided by the total number of all of the scores. Well, I've got the sum. So, I think I just want to do sum divided by what to get the actual average now? >> Yeah. >> Exactly. Sum divided by length will give me the average because the sum is the numerator effectively all of the scores added together and the denominator is the length. How many numbers were there actually? Now, I can't just write this math expression here. If this is going to be my function's return value, and we've done this once or twice before, I literally say in my average function, return this value. So, it hands back the work. I could use print f and just print it on the screen, but I don't want that visual side effect. I want to hand it back so that on line 23, I can simply calculate the average of those n scores and let print f use it as the value of that format code percent f. All right. Unfort uh I think we are in reasonably good shape. Let me cross my fingers now and hope I didn't screw this up. Make scores. Okay. Dot slashcores. How many do we want to do? So we'll do 72 73 33. Enter. And there is Oh, so close. Average. I've had a regression. I've made the same mistake again just in a different way. I think I saw your hand go up. Why am I getting 59 and I'm not getting my third of a point? >> Yeah, I in this return line on line 11. Right now, I'm again stupidly doing integer divided by integer. That will make us suffer from integer integer truncation because if you're returning an integer, there's no room for the decimal point or any numbers thereafter. So, how do we fix this? Well, I could change the sum to float. like that would be reasonable. So then I do a float divided by the length. I could do my casting trick like convert the float the length to a float just for the sake of floating point arithmetic. There's a bunch of ways to solve this but I think I'll go with this one. Now let me now do make scores again dot/score 72 73 33 and now I've got albeit with some imprecision I think enough precision certainly for like a college grade in this case 59.33 and so forth. Okay. So what are the things to actually care about here? So there's a decent amount of code here. Most of it is sort of stuff we've seen before, but the interesting parts I would propose are this. When you create your own function that takes an array as input, you have to take as input the length of the array. You're not going to be able to figure it out correctly. As in mo newer languages, you also need, of course, to pass in the array itself. How do you pass in an array? Well, when you're defining the function, you specify the type of values in the array. whatever you want to name the array inside of this function and then you use empty square brackets like this. You don't have to put n or some other number there. All you need to tell the compiler is that my average function is going to take some array of values specifically this many. You don't put it inside the square brackets there. Then when I use it now it's just the now familiar syntax when you want to index into your array that is go to location zero or one or two you just use square bracket notation here. But the array itself, recall, was actually created in Maine when I did this line of code here where I said, give me an array called scores, each of whose values is going to be an int, and I want this many of them. And so maybe the final flourish that I'll add here, just to be sort of nitpicky, is I keep saying that main should really go at the top. Fine, no big deal. Let me highlight my average function, move it to the bottom of my file just because, and then and only then I'll copy and paste that first line, the so-called prototype, so that Clang doesn't freak out by not knowing what the average function is. So in short, there's seemingly a bunch of complexity here, but all we're the only thing that's really new in this one example is this is how you pass to a function an array that already exists elsewhere, not by its name, but by with the square brackets there. Okay, questions on arrays or any of this new syntax? Yeah, >> a bit slow, but back when you did the whole like average thing, >> okay, >> you said that we could store it as a float >> and instead of saying 3.0 was a float, you just said because 3.0 is a float. How does it know it's not a double? >> Oh, uh, how does it know it's not a double? So, by default, if you just type a number like 3.0 zero into your code, it will be assumed to be a double just because um raw values, literal numbers with a decimal point will be treated by the compiler as doubles and be allocated 64 bits. >> So how come you still do percentage? >> Uh uh just because like the world did not need to create a new format code like percent D is not double percent D is decimal integer but don't worry about that. We tend not to talk about it too much in class. Percent I is integer. Percent F is float. But percent F is also double. And this is not consistent because what's a long percent L L I. What did I say last week? Percent LI gives you a long integer. It's just a mess. That's there's no good reason for this other than historical baggage. >> Thank you. >> Sure. I'm not sure if that's reassuring, but All right. So, um Okay. Let's use these this knowledge for like something useful now and actually tease apart what is uh how we can use these um these skills for good and to better understand what's going on inside of the computer as follows. Let me go over to our grid of memory and this time let's not store some numbers but let's store like these three lines of code these three variables. So three chars even though we you know where this is going like this is not good design because I got three stupidly named variables C1 C2 C3 but let's make a point first. The first variable's value is quote unquote H. Second is I. Third is exclamation point. Why though am I using single quotes suddenly instead of double quotes? >> It's a character. Chars are single quotes. Strings are double quotes. And we'll see the distinction why in a moment. So for instance, if this is my grid of memory and this program contains just three variables, each of them a char. Odds are they'll end up like this in memory. C1, C2, C3, HI, exclamation point. Assuming there's nothing else going on in my program, they're just going to end up being back to back to back in this way. even though it might not uh in in this way. So what does this really mean is going on? Well, let's go ahead and poke around. Let me go back to VS Code here. Let's close scores.c reopen my terminal and let's create a new program called high C and just do something playful. So let me include standard io.h at the top. Let me do int main void after that. And inside of my curly braces, let's just repeat this. C1 equals H in caps. Char C2 equals I in caps. and then char C3 equals exclamation point in cap uh in exclamation point. That's all. Now, let's actually poke around and see what's inside the computer's memory. So, I could do something like this. I could print f for instance, percent c percent back slashn and percent c turns out means character. So, what do I want to plug in? C1, C2, and C3 semicolon. So, let's go ahead and do this. Make high. enter dot /h high and voila, there's my hi exclamation point. There's no magic here. Like I'm literally just printing out three char variables. I can I don't need the spaces. If I want to get rid of those spaces between the word, I can remake this. Make high dot /h high. And now we're back in business. hi exclamation point. But here's where an understanding of types can give you a bit of power and sort of satiate some curiosity. What if I change my percent C to percent I? percent I percent i. So int int int. Well, turns out that a char is really just a number because it's an asky value from 0 to 255. So there's nothing stopping me from telling the compiler, don't print these as chars, print them as integers. So let's do make high dot /h high. Enter. And that's a little cryptic. It looks like it's saying 727,333, but no, let me add those spaces back in between each of those placeholders. make high again dot /hi there are our old friends 72 73 33 it is not necessary in this case to say int int int because the compiler is smart enough and print f is smart enough that if you hand it a value that happens to be a char it knows already it's going to be an integer essentially so you don't even need to bother explicitly casting it this way we're essentially implicitly casting it to an integer by using those format codes as such. All right, so that just proves that what I've claimed is the case, that there is this equivalence between characters and numbers is actually the case inside of the computer's memory. So even though you're storing hi exclamation point, technically you're storing three patterns of eight bits each that give you these decimal numbers 72, 73, and 33 or specifically these patterns here. All right, then what is a string? And this is where things get a little more interesting. string as we've used it is like a whole word or a phrase or when we started class today like a whole paragraph of text. So that's multiple values. Now why is that interesting for us potentially? Well, let's go ahead and write one line of code as a string. So here for instance is one line of code with a string. Let's go ahead and put that into my program. So I'm going to go back to VS Code here and clear my terminal. And I'm going to go ahead and delete all of this code here for a moment. And I'm going to do something like this. String s equals quote unquote high with excl uh with double quotes now. And now just like in week one, I'm going to print out percent s back slashn and print out the value of s per earlier because string is technically one of our training wheels for just a few weeks. I'm going to additionally include cs50.h at the top so that the compiler knows about what this word is string. All right, let's go into the terminal. make high dot /h high enter and we're back in business printing that out now as an entire string. Well, what's going on inside of the computer's memory this time? Well, I still have hi exclamation point, but it's a string now. Well, it turns out the way that's going to be laid out in the computer's memory is exactly like before. There's no mention of C1, C2, C3 because those variables don't exist. There's just one variable S, but it's referring to three bytes of memory, it would seem. hi exclamation point. And you can kind of see where this is going. Like a string, as a spoiler, turns out is actually just what an array. >> It's just going to be an array of characters. Hence the the dots we're trying to connect today. So at the moment though, this is a single variable s a string. The value of which is hi exclamation point. But you know what? If it is in fact an array, I bet we can start playing around with our new square bracket notation and see as much in our actual code. So in fact, let me go ahead and do this in VS Code. Now let's not use percent S. Let's use percent C, percent C, and percent C three times. Then instead of just S, let's print it out like it is an array. S bracket zero, S bracket 1, S bracket 2. Let's go back to VS Code. Uh my terminal in VS Code, make high dot slhigh. and nothing has changed, but I'm printing it out now one character at a time because I understand what's going on underneath the hood. In this case, I can actually see these values. Now, let's go ahead and change the percent C to percent I and add a space just so it's easier to read. Percent i space percent i space. I don't need my casts in parenthesis because print f is smart enough to do this for me. Make high again dot /h high. There again is my 72 733. However, that came from the mere fact that I put in double quotes hi exclamation point. So, what's really happening here is it seems that a string is indeed just an array of characters. But how does the computer know when doing percent s know what to actually print? In other words, it stands to reason that eventually if I've got more variables, more code, there's going to be other stuff in the computer's memory. Why does print f know when using percent s to stop here and not just keep printing characters that are over here? Especially if I did have more variables and more stuff in memory. Well, let's take a look at what's just past the end of this array. Let's go back to VS Code. And now let's get a little crazy and add in a fourth percent I. And even though this shouldn't exist, let's do S bracket three, which even though it's the number three, it's the fourth location, but hi exclamation point is only three values. So, let's look one location past the end of this array. Make high dot slashh high. Interesting. It seems, and maybe it's just luck, good or bad, that the fourth bite in the computer's memory seems to be a zero. Well, that's actually very much by design. And it turns out if we look a little further by convention what the compiler will do for us automatically is terminate that is end any string we put in double quotes with a pattern of 8 zero bits. More succinctly it's just the number zero because if you do out the math you've got eight zeros it gives you zero in decimal or more technically the way it's typically written is this because it's not like the number zero that we want to see on the screen. back slashz0 similar to back slashn is sort of a special escape character. This just means literally 8 zero bits not the number zero that you might see in a phone number or something like that. So even though we said string s equals quote unquote high with an exclamation point seemingly three characters, how many bytes does a string of length three [music] actually seem to take up in memory? It's actually going to be four. Then this happens automatically. That's what the double quotes are doing for you. They're telling the compiler, "This is not just a single character. This is a sequence of characters. Please be sure to terminate it for me automatically with a special pattern of 8 bits." And that special pattern of 8 zits actually has a name. It's the so-called null character or null for short. The null character is just a bite of zero bits and it represents the end of a string. You've actually seen it before if super briefly two weeks ago. Here was our ASKI chart and we focused mostly on like this column here and this column here and then we looked at the exclamation point over here. But all this time over here asky character zero is null n which just means that's how you pronounce all eight zero bits. It's been there this whole time. So why is it done this way? Well, how is the computer actually printing something out in memory? Well, it needs to know where to stop. Print F is pretty stupid. Odds are inside of print f there's just a loop that starts printing the first character, the next character, the next character, and it's looking for the end of the string. Why? Well, consider what might happen. Suppose you've got a program that has not just one string, but two. For instance, two strings like this. So, in fact, let me go back to VS Code here, clear my terminal, and let's just make this program a little more interesting for a moment. String t equals quote unquote by, for instance. And then down here, let's do two print fs. percent s back slashn and print out s print f percent s back slashn print out t. Now to be clear, percent s means string placeholder. T and s are just also the names of the variables. There's no percent t that we want to use here. All right, let me go down to my terminal make high and voila, I get high and by just like you would have expected last week. But what's going on inside of the computer's memory? Well, in so far I asked I have asked it to create two variables s and t like this. Odds are what's happening in the computer's memory is high is ending up here aka s t because there's nothing else in this program is probably going to end up here b exclamation point but it wraps on this particular screen. T is taking up 1 2 3 4 five bytes total just as high is taking up four bytes total because the compiler is automatically adding for me the back slashzero the null character to make clear to other functions where this string ends. So what does this mean in real terms and why is it zero? Well, why is it zero? Like h just because like at the end of the day all we have is bits. We've got eight bits to work with for chars. You got to pick some pattern. We could have chosen all ones. We could have chosen all zeros. We could have chosen something arbitrary. A bunch of humans in a room years ago decided eight zeros will mean the null character. That's the special character we will use to terminate strings in this way. Well, what does that mean with our new syntax? Well, it means we could poke around with strings as well. So, even though that first variable is S and that second one is T, you could technically poke around and access S brackets 0 and 1 and 2 and 3. t bracket 0 1 2 3 and four and so forth. So, in fact, if I wanted to dive in deeply there and actually see that, well, let me go ahead and do this. Uh, back in VS Code here, let me make a refinement here. I've now got, uh, my two strings here. Um, I could go and, for instance, down here, just like before, percent C, percent C, percent C, percent C, percent C, percent C, percent C. And if I then do s bracket zero, uh, s bracket 1, s bracket 2, whoops, two, and then down here, t bracket zero, t bracket 1, t bracket 2, t bracket three, and I'm doing that only because the word by is longer than the word high. If I do make high, same principles work even in this context here. But let's add an interesting twist just because if I have these values in memory here uh as follows. Well, it's kind if I've got two words in memory, I could use them in an array too. Instead of having like s and t or word one and word two, I can actually put strings in an array, too. So, let's go ahead and do this. Let me go back to VS Code. And just for fun now, let's go ahead and do this. Give me an array called words that's going to fit two strings. Then in the first words, words bracket zero, put hi. Then in words bracket one, put by. The only thing new here is that I'm making an array of strings now instead of an array of ins. But all of the syntax is exactly the same. How can I go about printing these things? Well, just as before, I can do print f percent s back slashn and print out words bracket zero. Then I can do print f quote unquote s back slashn words bracket one. And again, I'm just sort of applying the same simple syntax that we saw before. SLHigh again of the sixth version of this program, right? I'm just sort of jumping through syntactically to demonstrate that these are just different lenses through which to look at the exact same idea. And while a normal person would not do this, we could think about what's really going on in memory with arrays of words when those words themselves are arrays of characters. because a word is just a string. So this code here gives us something like this in memory in that program a moment ago. This is words bracket zero. This is words bracket one. The only thing that's different is I'm not calling them sn. I've given them one name with two locations 0 and one. Well, if each of these values is itself a string, well, you said earlier that a string is just an array. So we can actually think of these two strings even though the syntax is getting a little crazy using two sets of square bracket notation where I can index into my array of words and then index into the individual letters of that word by just using more square brackets. And again, this is just to demonstrate a point, not because a normal person would do this. But if I go back to VS Code, instead of printing out these two strings, why don't I do something like this? Print f quote unquote percent C percent C percent C back slashn. Then let's print out the first word, but the first character therein. Let's print out the first word, but the second character therein, the first word, but the third character therein. And even though I'm saying third and second and first, it's 2, 1, and zero respectively because we start counting at zero. And then lastly here, we can print out the second word. Percent C, percent C, percent C, percent C, back slashn, then words bracket. How do I get to the second word in this array? Words bracket one, the first character they're in. Words bracket one, the second character they're in. Words bracket one, the third character they're in. words bracket one the last character therein and again I'm this is just to demonstrate a point but if I do make high now dot slashh high we have full control over everything that's going on if you now do agree and understand that an array can be indexed into square bracket notation as can a string because a string is itself just an array strings are arrays for today's purposes then questions on any and all of these tricks. No. All right. Yeah. In front. >> Okay. How do you like that? >> How do you establish or create an array? Well, in the context of this program, if I go back to VS Code, line six here gives me an array of size two, an array of two strings, if you will. The previous example we were playing with, which was my scores, uh, whoops, wrong program, wrong file. If I open up scores C as before, this line here, line nine, gives me an array of n integers. So, that is what establishes or creates the array in memory. You specify a name, the size, and the type. That's all. And the only thing that's new today again is the square bracket notation, which in this context creates an array of that size. But once it exists, you can then access that chunk of memory by using square brackets as well. Other questions on arrays? Yeah, in front. all the values in the array as you declare it or do you need to go in index by index to declare? >> Good question. Do you need to go index by index to put things inside of an array? Short answer, no. So, let me open up again scores.c from before and what I could have done in an earlier version of my program would be something like this. I could have done 72 73 33. And I deliberately didn't show this because I didn't want to add too much complexity, but you can use curly braces in this new way and initialize the array in one line. And in that case, you don't even need to specify the size because the compiler is not an idiot. It can figure out that if you've got three numbers on the right, it knows that it only needs three elements on the left to put them into. But let me undo that and leave it just as I did. But short answer, yes. You can statically initialize an array if you know all of the values up front and not when using get int. All right. So, if you're on board with the idea that all a string is is an array and that array is always null terminated, we can now use that knowledge to like solve some simple problems and problems that others have already solved before us. So, let me go ahead and close that file in VS Code. Let me go ahead and open up another program here called length.c. And let's just play around with the length of strings as follows. Let me include the CS50 library at the top. Let me include standard io after that. Let me do int main void after that. And then inside of main, let's prompt the user for their name by using get string and just say name colon today. And then after that, let's go ahead and figure out the length of the person's name. Like d- avid, I should get the answer of five. And ke ly, we should get the answer of five. And hopefully for a longer or shorter name, we'll get the correct answer as well. So, how can I go about counting the number of characters in a string? Well, the string is just an array, and that array ends with the null character. There's a bunch of ways we can do this, but let me go ahead and do this. Let me create a variable called n, which eventually will contain the length of the name. And I'm going to set it equal to zero because I don't know anything yet about the length. Then, I can do this with a for loop, but I prefer this time to use a while loop. I'm gonna say the following. While the person's name at that location does not equal backs slashz0, go ahead and add one to the value of n. And then after all of this, go ahead and print out with percent i back slashn the value of n. So what's going on here? This is easier said when you know already where you want to go with it, but with practice, you too can bang this out pretty quickly. n is going to contain the length of my string. I have in my loop here a boolean expression that's just asking the question, does name at the current value of n not equal the null character? In other words, you're asking yourself, is this character null? Is this character null? Is this character null? Is this character null? And if not, you keep going. You keep going. And this is kind of a clever trick because I'm using n and incrementing it inside the loop. So when I look at d, that's not equal to back slashz. So I increment n. Now n is one. So I look at name bracket one. What's at name bracket one if it's my name? A. A does not equal back slashz0. So it increments n. What's at location two in dav ID? V. V does not equal back slashn. So we repeat with i. We repeat with d. And then we get to the end of my name which is the null character because the get string function and c put it there automatically for me. The null character does equal backs slash0. n does not get incremented any more time. So at this point in the story on line 13, n is still five because I have not counted the new the null character. So I hope I will see five on the screen. This is just kind of a very mechanical way of checking checking checking checking trying to figure out uh through inference how long the string is because it's as long as it takes to get to that back slash zero the null character. So, let's do make length. Enter dot slength. Type in my name, David. And I indeed get five. Let's go ahead and dolength Kelly. I indeed get five. And hopefully for shorter and longer names, I'm going to get the exact same thing, too. In fact, we can try a corner case. Dot slashlength. Enter. Let's not give it a name at all. If I just hit enter here, what should the length of the person's name be? Zero. Which is not incorrect. It's literally true. But that's because we're going to get back essentially quote unquote. But even though it's quote unquote in the computer's memory, it's still going to take up one bite because the get string function will still put null at the end of the string even if it's got no characters therein. So it turns out this is not something you need to do frequently like initializing a variable using a loop like this. It turns out there are better solutions to this problem. You do not need to reinvent this wheel yourself because it turns out in addition to standard io.h H and CS50.h and as you probably saw in problem set one, math.h uh and perhaps others. There are other libraries out there, namely the string library itself. In fact, if you go into the CS50 manual, you can look up the documentation for a header file called string.h, which contains declarations for that is prototypes for a whole bunch of helpful functions. In fact, the manual pages for it are at this URL here. The most important function and the one we're going to use so often for the next few weeks is wonderfully called stir lang for string length. Someone else literally decades ago wrote the code that essentially looks quite like this but packaged it up in a function that you and I can use. So we don't have to jump through these stupid hoops just to count the length of a string. We can just ask the string length function what the length of a string is. But odds are if we looked at the C code that someone wrote decades ago, it would look indeed quite like this. So how can I simplify this program? Well, I can get rid of all of this code here. I can include string.h at the top of my file. And then I quite simply could do something like this. int length equals sterling of name. That's going to put in the variable length. Actually, let's be consistent. int n equals stir length of name. And then on line nine, let's print it out. Let's try this. Make length dot slashlength David. Okay, Kelly. Okay, and no one. And zero. It seems to now be working. So this is a wheel we do not need to in reinvent. And frankly, now in a matter of design, I don't really need the variable n anymore. Recall that we can nest our functions just like we did with average before. So let me get rid of that line and just say sterling of name is actually perfectly reasonable here. All right. Well, what more can we do with this? Well, let's consider some other matters of design. Let me close out length C and let's create another program of our own called string. C in which we'll play around now with this library and others. Let me go ahead and include cs50.h. Let me go ahead and include standard io.h. Let me go ahead and include also string.h. All right, what do I want to now do? Well, in main void and inside of main, let's go ahead and write a program that prints a string character by character just to demonstrate these mechanics. So, string s equals get string and I'm going to ask the user for some input because I just want to play around with any old string. I'm going to go ahead and proactively say output here and I'm going to go ahead and uh not use a new line character there deliberately below this. Now I'm going to have a for loop, though I could use a while loop that says int i equals z, i is less than sterling lang of s, the string I just got from the human, and increment i on each iteration. And on each iteration, print out just one character in that string, specifically at s location i. And then at the very bottom of this program, let's just print a single backslash n to move the character onto a new line. Long story short, what have I done? I wrote a stupid little program that prompts the user for a string, prints the word output thereafter, and then it just prints the word that they typed in character by character by character by character until it reaches the end of the string based on the length returned by Sterling. So, let's go ahead and run this in my terminal window. I'm going to do make string dot sling and I'll type in my own name of before. This was a subtlety. I deliberately wrote two spaces here because I just um to be nitpicky, I wanted input and output to line up perfectly. So you can see what's happening. Indeed, if I do enter here, now I see input is David. The output is David as well. So that was just a formatting trick that I foresaw. Why is this program correct but not arguably well-designed? It's pretty good in that it's using the Sterling function. I didn't reinvent the wheel unnecessarily, but there's an inefficiency that's kind of subtle. And it relates to how a for loop works. Any thoughts? This program I claim is doing unnecessary work somewhere. Yeah. >> Why do you have to character? >> Okay, that's definitely stupid. Um, you don't have to output a character by character. That's just my pedagogical decision here. So, correct, but not the question we're fishing for. There's a second stupid thing. Yeah. >> Yes. Every time through this loop, and this isn't so much my conscious choice, but my mistake. I'm checking the length of S again and again. Why? Because recall how a for loop works. The initialization happens once at the very beginning. Then you check the boolean expression. Then if it's true, you do the code. Then you do the update. Then you check the boolean expression. Then you do the code. update boolean expression you do the code but every time you evaluate this boolean expression you're asking does ah is i less than the ster length of s but this is a function call like you are literally using sterling again and again and again and like a crazy person you're asking the computer what's the length of s what's the length of s what's the length of s it's not going to change it's going to be the same no matter what so how can we fix this well I could solve this in a couple of ways like I could for instance down here do int n equals stir lang of s and store it in a variable n and just do that. I think that eliminates the inefficiency because now I calculate the length of s once. It's not going to change nor is my variable. So I can now use and reuse that variable. It's just saving me a little bit of time, you know, microsconds maybe. But when you're writing bigger programs and you're doing things in loops, if that loop is running not three times or five, but a million times, uh, millions of times, all of those microsconds, milliseconds might very well add up. But it turns out there's some syntactic tricks we can do too. I alluded to this earlier. If you want to initialize not one variable but two, you can actually do it all before the first semicolon like that. So now on line 9, I'm declaring a variable called i and setting equal to zero. And I'm declaring a second variable called n, also the same type, int, and setting it equal to the length of s. And now I can use that again and again. Now, as an aside, this is a little bit of a white lie because smart compilers nowadays are so advanced that they will notice that you're calling Sterling again and again inside of a loop and they will just fix this for you unbeknownst to you. But it's representative of a class of problems that you should be able to spot with your own human eyes and avoid altogether so that you don't waste more time and more compute and more money in some sense than you might otherwise need to in this case. Any questions on that there? Optimization. Yeah, >> you do not say int. Again, the constraint is that you have to use the same data type for all of your initialization. So, you better hope that you only want ins otherwise you got to pull it out and do what I did earlier. Good question. Others on this? Yeah. >> When does it spaces? >> When does it account for spaces? A space is just uh character asky character number 32. So there's nothing special about it. It's sort of invisible but it is there. It is treated like any other character. There's no special accounting whatsoever. The null character which is also invisible is special because print f and sterling know to look for the end of that variable the end of that value as such. All right, let's try one other demonstration of some of these ideas here. Let me go into uh a another file that we'll create called how about uppercase C. Let's write a super simple program that like uppercases a string that the human types in and see how we can do this sort of good, better, and best. So I'm going to call this file uppercase C. Inside of this file, let's use our now friends include CS50.h. Let's do include standard io.h. Let's then include lastly, how about uh string.h. And the goal here inside of main is going to be to get a string from the user. So string s equals get string. And we're going to ask the user for a before string representing what it is they typed before we uppercase everything. Then I'm going to go ahead after that and print out just as a placeholder after and two spaces just to be nitpicky so that the text lines up vertically on the screen. Now I'm going to do the following for int i= z n equals sterling lang of s semicolon i less than n just like before i ++. So I'm just kicking off a loop that's going to iterate over the string the human typed in. Now if my goal in life is to change the user's input from lowercase if indeed in lower case to uppercase let's just express that literally. If the current character in the string, so s bracket i is greater than or equal to quote unquote a and s bracket i is less than or equal to quote unquote z using single quotes. This is arguably a very clever way of expressing the question is it lowercase. We know from our ASKI chart from week zero that uh the ASKI chart has uh not only numbers representing all the uppercase letters but also numbers representing all the lowercase letters. Lowerase A for instance is 97 and they are all contiguous thereafter. So we can actually treat just like we did before chars as ins and ins as chars and sort of ask mathematical questions about these chars and say is s bracket i between a and z inclusive. So if it is lowercase and I'll add a comment here for clarity. If S bracket I is lowercase what do we want to do? We want to force it to uppercase. So this is a little trick I can do as follows. Print f the current character. But let's do some math on it. Let's change s bracket i by subtracting some value. Well might that value be? Well recall from week zero our asky chart here. And let's focus for instance on the lowercase letters here and the uppercase letters here. What's the distance between all upper and lowercase letters? It's 32, right? And the lowercase letters are bigger. So, it stands to reason if I just subtract 32 from the lowercase letter, it's going to immediately get me to the uppercase version thereof. So, this is kind of cool. So, I can actually go back to VS Code and I can literally subtract the number 32 in this case because ASKI is a standard. It's not going to change. else. If the letter is not lowercase, I'm just going to go ahead and print it out unchanged without doing any mathematics at all to it. And I'll make clear with a comment. Uh, else if not lowercase makes clear what's going on there. All right, let me go ahead and make uppercase in my terminal window. Dot sluppercase. Let's type in my name all lowercase. And I get back David. H, minor bug. Couple bugs actually. Let me fix my spacing. I think I want another space after the word after. And at the very bottom of my program, I think I want a back slashn. Now, let's rerun uh make unuppercase dot /upercase enter dab. And now it's forcing it all to uppercase. Meanwhile, if I do it once more and type in name capitalized, it's still going to force everything else to uppercase. Questions? >> You're spacing for the after. >> Oh, I'm an idiot. Okay, thank you. Yes. Uh I misspelled after otherwise my lining my alignment would have worked. So let's do this again. Make uppercase if only so that we can prove it's the same dab and all lowercase. And there we go. That was thank you the intent. All right. So it's kind of a little trick but this is kind of tedious, right? Like Microsoft Word, Google Docs all have the ability to toggle case from uppercase to lowerase or lowerase to uppercase. It's kind of annoying that you have to write this much code to achieve something so simple seemingly and so commonplace. Well, it turns out there's a better approach here, too. In addition to there being the string library, there's also the cype library in cype.h, another header file, there's a whole bunch of other functions that are useful that relate to characters uh characters uh in ASI. So, for instance, if we go ahead and use this as follows, I'm going to go ahead at the top of my file here and include now cype.h. It turns out there's going to be functions via which I can actually ask these questions myself. For instance, in this next version of the program, I don't need to do any of this clever but pretty verbose math. I can just say if the is lower function which comes from the cype library passing in s bracket i returns true, we'll then convert the letter to lower uppercase by subtracting 32. But you know I don't even need to do this mental math or math in code. I can also from the cype library use a function called to upper which takes as input a character like s bracket i and let someone else's function do the work for me. So let me go back down to my terminal window here. Let me make uppercase now dot /upercase enter before dab ID. This now works too. But if I really dig into the documentation for the cype library, you'll see that you can just use the is lower function on any character and it will very intelligently only uppercase it if it is actually lowercase. So someone else years ago wrote the conditional code that checks if it's between little A and little Z. So knowing this, and you would see that indeed in the documentation, I don't even need this else. I can instead just get rid of this whole conditional, tighten my code up significantly here and simply say print f using percent c the two upper version of that same letter and let the function itself realize if it's uppercase pass it through unchanged if it's lowercase change it first and then return it. So now if I open my terminal window again and clear it make uppercase dot slashupcase enter dav ID and we're back in business. So again, demonstrative of how if you find that coding is becoming tedious or you're solving a problem that like surely someone else has solved, odds are there is in fact a library function for whether it's from CS50 or from the standard library that you yourselves can use. Um and unlike the CS50 library, which is indeed CS50 specific, which is why Clang needed to know about -L CS50, many of these libraries just automatically work. You don't need to link in the cype library. you don't need to link in other libraries. Um, but non-standard libraries like CS50's training wheels for the first few weeks, we do need to do that. But make is configured to do all of that automatically for you. All right, in our final minutes together, let's go ahead now and reveal some of the details we've been rubbing um uh sweeping under the rug about Maine. I asked on week one that you just sort of take on faith that you got to do the void, you got to do the int, you got to do the void and all of that. Well, let's see why that actually is. So, main is special in so far as in C. It is the function that will be called automatically after you've compiled and then run your code just because not all languages standardize the name of the function, but C and C++ and Java and certain other ones do. In this case, here is the most canonical simple form of main. We know that including standard io.h H just gives us access to the prototypes for functions like print f. But what's going on with int and what's going on with void? Well, void in parenthesis here just means that main and in turn all of the programs we've written up until this moment do not take command line arguments. Literally every program we've written / a.outhello/scores dot sl everything else. I have never once typed another word after the name of our programs that we've written in class. That is because every program has void inside of these parenthesis telling the computer this program does not take command line arguments, words after the program's name. That is different from make and code and cd and other commands that you've typed with words after them their names at the prompt. But it turns out the other supported syntax for the main function in C can look like this too, which at a glance looks like kind of a mouthful, but it just means that main can take zero arguments or it can take two. If it takes two, the first is an integer and the second is an array of strings. By convention, those inputs are called arg and arg. arg is the count of arguments that are typed after the pro uh after the program's name. Arg is the argument vector aka array of actual words. In other words, now that we have the ability to use arrays, we can get zero or one or two or three or more words from users at the prompt when they run our own programs. So what do I mean by this? We can now write programs that actually have command line arguments as follows. Let me go into VS Code here and close our old program uppercase. Let's write a new simpler program here in my terminal called greet C and just greet the user in a couple of different ways. So I'm going to include initially CS50.h and then I'm going to include standard io.h here. Then I'm going to say int main void without introducing anything new just yet. I'm going to ask the user like we did last week for a return value from get string asking them what's your name as we've done so many times. Then I'm going to say print f hello percent s back slashn spitting out their answer as follows. Same program as last week again I'm going to make greet. I'm going to say /greet and I'm prompted now for my name. I hit enter. Notice that I did not take any command line arguments. The only command I ran was dot / greet no other words. Let's now use this new trick and actually let the user type their name when they're running my program rather than waste their time by using getstring and prompting them. Let me go into my editor here. Let's get rid of the CS50 library. Let's get rid of my use of get string and let's simply change void to int arg c then string argv open bracket close bracket. That's all down here. Let's simply print out argv bracket 1 for reasons we'll soon see. The only change then I'm making really is changing the prototype for main from the first version which we've been using for like a week and a bit now to the second version which is the only other version supported. I'm going to go back to my terminal window now. Make greet and darn it. I shouldn't so close. Why did I make uh how do I fix the mistake I accidentally made? Yeah, in back. Oh, no. In front. >> Yes, I should have kept the CS50 library because it's in the CS50 library that string is defined. So, include CS50.h. In week four, we will delete that line for real and actually show you what string actually is. I promised at the start of class that string is a term of art, but it's not a keyword in C, but it we'll see what it means in a couple of weeks time. Okay, let me fix this. make greet dot slashgreet but now I'm gonna type before I even hit enter my actual name and when I hit enter now I see hello David if I instead dot /g greet kelly enter now I see hello Kelly if I do nothing like greet enter I just see hello null which is not the same null as before n this is n u lll for reasons we'll come back to before long but clearly print f knows something's going on there's no actual word there. Why though did I do arg bracket one? Well, it turns out that just as a feature of C, if I recompile this program and do dot /greet and type in nothing else, I'm going to see something kind of curious. Hello. Because automatically the zero location in the arg variable will automatically contain the program's own name. Why is this useful? If you ever want to do something self-referential like thanks for running my program or you want to show documentation for your program and the name of your program that it depends on whatever the file itself is called, you can use argv bracket zero which will always contain the program's name no matter what the file has been named or renamed to. But we can fix that null issue now in a couple of ways. So arg c is the other input that I said now can exist which is the count of arguments at the prompt. So if I want to check if the user actually typed their name, I could say something like if arg c equals equals 2. Well then and only then go ahead and print out their name. Else let's just do some clever default like print f quote unquote hello world or heck nothing at all. This version of the program now is a little smarter because when I run make greet and dot /gre of my name works exactly as intended. But if I forget and only dot slashgreet it's going to say hello world. Moreover, if I don't quite cooperate and I say David Men enter, it similarly just ignores me because arg count is not two anymore. It's now three. So, arg contains the total numbers of words at the prompt, but the first one is always the program's name. Question. >> Sorry. Can you say that once a little louder? Why is it information that we just have or >> Oh, so the short answer is just because like the definition of C, if you look up the documentation for C, you can either define main as taking no arguments with the word void Or you can specify that main can take two arguments and the compiler and the operating system will just ensure that if you provide two those two variables arg will be filled with those two val values automatically. Someone else decided that though that's just the way it works. You can't come up you can't put three there. You can't put four there. You can change the names of those variables but not the types because of this convention. So there's one last feature of main then it's the actual value it returns. Up until now every program I've written starts with int main something. Int main something. What is that int? We have yet to use it. Technically the value that main returns is going to be called a so-called exit status which is a numeric status that indicates success or failure. Numbers are everywhere in the world of computing. So for instance here's a screenshot from Zoom whereby if something goes wrong with Zoom like you have bad internet connectivity or something like that you might see an error code like 1132. That means nothing to normal people unless you Google it, look up the documentation, but it means something very much to the software engineers who wrote this code because they know, oh shoot, 1132 means this error and they probably have a spreadsheet or a cheat sheet somewhere that converts those codes to actually useful error messages. And frankly, in a better world, they would just tell you what the problem is rather than just say report the problem and mention this number. That said, on the web, odds are you're familiar with this number 404, which is also a weird thing for so many normal people to know, but this generally means file not found. It's a numeric code that signifies that something has gone wrong. Exit status isn't quite this, but it's similar in spirit. In Maine, you can return a value like zero or one or two or something else to indicate whether something was successful or not. By convention, a program, a function like Maine returns zero on success if all is well. And that leaves you then with like several hundred possible things that can go wrong because you could return one to signify one thing, two to return another, three to signify another, and so long as you have a spreadsheet or a cheat sheet or something, you can just keep track as the programmer as to what error means what. So what does this mean in real terms? Well, if I go over to VS Code here, let me implement a relatively simple program, our last called status.c. So in status C, I'm going to go ahead and use the CS50 library at the top, the standard IO library at the top, and then inside of int main and with our new uh format int arg c string arg v square brackets inside of main, I'm going to now do the following. If arg c does not equal to, then I'm going to go ahead [clears throat] and print out this time a warning. I'm not going to have some silly default like hello world. Let's tell the user that they didn't use my program correct. And I'm going to say print f missing command linear argument. And we'll assume they know what that means. Then to signify an error, I'm going to say return one. It could be two, it could be three, but this is the first possible error. So I'm going to start simple with one. Otherwise, if arg does equal to and I get to this part of my code, I'm going to say hello, percent s back slashn and pass in argv bracket 1 just like before. And just to be super specific, I'm going to return zero to tell the computer, the operating system, that this is success. Zero signifies success. Any other value signifies error. Let's make status now. Let's do dot /st status. And this is a little magical, but let me go ahead and cooperate initially. I'm going to type in my name David. And I'm going to see hello, David. Uh most people wouldn't know this but among the commands you can type at your terminal are this one here and the TFS and II the TAS and II would do something like this. We after running your code can do echo space dollar sign question mark and we can see secretly the return value that your program returned zero in this case. Meanwhile if we do this again dot slatus uh dot slash uh status and let me not type my name this time. When I do this, I see missing command line argument. What value should the code have returned? Then one. So let's see echo dollar sign question mark. There's the one. So even after just one week of CS50, if you've ever wondered how check 50 knows if your code was correct or not, among the ways we check for that is by checking this semi-secret status code, this exit status, which isn't really a secret. It's just not displayed to normal people because it's not all that enlightening unless you're the software developer who wrote the code in question. But this means we could return one in some cases or two in other cases or three or four in yet others. And these command line arguments are sort of everywhere. And in fact, a program I skipped over a moment ago was going to be this. There's no uh academic value to what you're about to see. But uh another program that takes command line arguments is known as cows. And this is sort of very famous in computing circles because it's been on systems for many years. Cowsay is a program that allows you to type in a word after the prompt like moo and it will print out what's called asky art. An adorable little cow with a speech bubble that says moo. So kind of evocative of like scratch, but it takes other command line arguments, not just the words that you want to come out of its mouth, but even the appearance that you want it to have. So for instance, I can say -f duck and run it again. Enter. And now I have a little cute duck saying moo, which is a bit of a bug. So let me change that to quack for instance instead. And again no academic value here. It's just fun to now play with the various options. But if we really want to have fun with this, we can do another one. So cow say-f dragon. And we can say something like raar. And now we have this crazy dragon appearing on the screen. Which is to say again no value here. It's just fun to play with command line arguments sometimes. And how is cows doing this? Well, someone wrote code maybe in C or some other language using arg c and argv and poking around at their values and maybe a conditional that says if the -f value is dragon then print this graphic else if the value is duck then print this other one. It all boils down to the same fundamentals of week zero of functions and conditionals and loops and boolean expressions and the like. It's just being composed into more and more interesting things. And indeed in closing among the other interesting things we'll play with this week to come full circle is that of cryptography. the art of scrambling information so as to have secure communication. So important nowadays with passwords and credit card numbers and personal messages that you might want to send and we'll have you explore through code some of the algorithms via which you yourselves can encrypt information. And there's a number of ways we can do this form of encryption and they all boil down to this mental model. You've got some input like the message you want to send and you want to incipher it somehow, encrypt it somehow so that no one knows what message you've sent. So you want your plain text, which is the human readable version in English or any other language to become cipher text ultimately. So the code you'll be writing this week is inside of this black box some kind of cipher, an algorithm that encrypts information so that you can do exactly this. Now the catch is that you can't just give it plain text and run it through an algorithm and get cipher text because you need to somehow have a secret typically for encryption to work. Like if I'm going to send a message to someone in back, well, I could just randomize the letters that I'm writing down. But how would they know how to reverse that process? Probably what we need to do is agree in advance that you know what, I'm going to change every A to a B and every B to a C and a C to a D and a Z to an A. I'll wrap back around at the end of the uh the alphabet. It's not very sophisticated, but who know middle school teacher if they intercept two kids passing notes in class are going to waste time trying to figure out this cipher. But it does presuppose that there's a secret between them, the number one in that case, because I'm changing every letter by one place. So how might this work? Well, if I want to encrypt the word hi, hi exclamation point and my secret key with someone that I've come up with in advance is one. I should send the cipher text i j exclamation point. Now, this is a simple cipher, so I'm not really encrypting the punctuation, which may or may not be a good thing, but I am encrypting at least the alphabetical letters. But what does the recipient then have to do to decrypt this message? When they see on paper I J exclamation point, how do they know what I said? Well, they use that same key but subtract. So B becomes A, C becomes B, A becomes Z and so forth. Essentially inverting the key from positive one to negative 1. Of course, slightly more secure than uh a cipher of one, a key of one would be 13. And in fact, in computing circles, 13 has special significance. ROT 13, RO T13 is an algorithm that's been used for many years online just to sort of avoid spoilers. Like Reddit might do this or other websites where they want you to have to do some effort to see what the message says. But it's not all that hard. You just have to click a button or write the code that actually does this. But if you use 13 instead, you wouldn't get uh J uh you wouldn't get I J. You'd get UV because U and V are 13 places away from H and I respectively. But again, we're not touching the punctuation. Or we could send something more personal like I love you and the message comes out like that. Slightly more secure than that would be rot 26. No. >> No. Why? Because it's the same thing. It literally rotates all the way around. A becomes a, b becomes b. So there's a limit to this. But more seriously, that speaks to just how strong this encryption is or is not. Because if you think about this now from an adversar's perspective, like the teacher in the room intercepting the slip of paper, how much work do they need to do? Well, they just try all possibilities. Key of one, key of two, key of three, dot dot dot, key of 25. And at some point, they will see clearly that they guessed the key, which means that cipher is not very secure. Nonetheless, what we're talking about is historically known as the Caesar cipher because back in the day, when Caesar was communicating by uh by uh by legend uh with his generals, if you're the first human on Earth to come up with encryption or come up with this specific cipher, it doesn't really matter how not complex it is if no one else knows what's going on. Nowadays, it's not hard at all to write some C code or any other language that could just brute force their way through this. So there are much more sophisticated algorithms nowadays than simple rotations of letters of the alphabet as we'll soon see. But when it comes to decryption, it really is just a matter of reversing that process. So this message here, if we rotate all the letters in the opposite direction by subtracting one, will be our final flourish for today. There's a bit of a hint there which will reveal that this message and our final words for us as the clock strikes 4:15 is going to be the U becomes T and the I becomes H. Um, this I'm the only one. This is amusing. H I S W A S C50. And this was CS50. We'll see you next time. [applause] [music] Heat. Heat. [music] [music] [music] Heat. Heat. Heat. [music] [music] Heat. [music] Ow. Black. B. W. Heat. Heat. Heat. >> [music] >> All right, this is CS50. This is week three. And this was an artist rendition of what various sorting algorithms look and sound like. Recall from week zero that an algorithm is just step-by-step instructions for solving some problem to sort information as in the real world just means to order it from like smallest to largest or alphabetically or some other heristic. And it's among the algorithms that we're going to focus on today in addition to searching which of course is looking for information as we did in week zero too. Among the goals for today are to give you a sense of certain computer science building blocks. Like there's a lot of canonical algorithms out there that most anyone uh who studied computer science would know, who anyone who leads a tech interview would ask. But more importantly, the goal is to give you different mental models for and methodologies for actually solving problems by giving you a sense of how these uh real world algorithms can be translated to actual computers that you and I can control. We thought we'd begin today uh with an actual algorithm for sort of taking attendance. We of course do this with scanners outside, but we can do it old school whereby I just use my hand or my mind and start doing 1 2 3 4 5 6 7 8 9 10 11 12 and so forth. That's going to take quite a few steps cuz I've got to point at and recite a number for everyone in the room. So I could kind of do what my like grade school teachers taught me, which is count by twos, which would seem to be faster. So like 2 4 6 8 10 12 14 16 18 20. And clearly that sounds and is actually faster. But I think with a little more intuition and a little more thought back to week zero, I dare say we could actually do much better than that. So, if you won't mind, I'd like you to humor us by all standing up in place and think of the number one if you could and join us in this here algorithm. So, stand up in place and think of the number one. So, at this point in the story, everyone should be thinking of the number one. Step two of this algorithm for you is going to be this. Pair off with someone standing. Add their number to yours and remember the sum. Go. Okay. At this point in the story, everyone except maybe one lone person if we've got an odd number of people in the room is thinking of what number? >> Two. Okay. So next step, one of you in each pair should sit down. Okay, good. Never seen some people sit down so fast. So those of you who are still standing, the algorithm still going. So the next step for those of you still standing is this. If still standing, go back to step two. Air go repeat or loop if you could. And notice if you've gone back to step two, that leads you to step three. That leads some of you to step four, which leads you back to step two. So this is a loop. Keep going. If still standing, pair off with someone else still standing. Add together and then one of you sit down. So with each passing second, more and more people should be sitting down and fewer and few are standing. Okay, almost everyone is sitting down. You're getting farther and farther away from each other. That's okay. I can help with some of the math at the end here. All right, I see a few of you still standing, so I'll help out and I'll I'll join you together. So, I see you in the middle here. What's your number? >> 32. >> 32. Okay, go ahead and sit down and I'll pair you off with What's your number? >> 20. Okay, you can go ahead and sit down. Uh, who's still You're still standing? >> 27. >> 27. Okay, you can sit down. >> You guys are still adding together. Who's going to stay standing? Okay. What's your number? >> The worst part is doing like arithmetic across a crowded room, but >> 27. >> 27. Also >> 47. >> 47. Okay, you can sit down. Is anyone still standing? Yeah, >> 15. >> Nice. 15. Okay, you can sit down. Anyone still standing? Okay, so all I've done is sort of automate the process of pairing people up at the end here. When I hit enter, we should hopefully see Oh, the numbers are a little What's going on there? There we go. When I hit enter, we'll add together all of the numbers that were left. And if you think about the algorithm that we just executed, each of you started with the number one, and then half of you handed off your number. Then half of you handed off your number. Then half of you handed off your number. So theoretically all of these ones with which we started should be aggregated into the final count which if this room weren't so big would just be in one person's mind and they would have declared what the total number of people in the room is. I'm going to speed that up by hitting enter on the keyboard. And if your execution of this algorithm is correct, there should be 141 people in the room. According to our old school human though, Kelly, who did this manually, one at a time, the total number of people in the room, according to Kelly, if you want to come on up and shout it into the microphone, is of course going to be >> I don't know, something around 160, I think. >> 160. So, not quite the same. Okay, but that's pretty good. Okay, round of applause for your your accuracy. [applause] Okay, so ideally counting one at a time would have been perfectly correct. So, we're only off by a little bit. Now, presumably that's just because of some bugs in execution of the algorithm. Maybe some mental math didn't quite go according to plan. But theoretically, your third and final algorithm wherein you all participated should have been much faster than my algorithm or Kelly's algorithm whether or not we were counting one at a time or two at a time. Why? Well, think back to week zero when we did the whole phone book example, which was especially fast in its final form because we were dividing and conquering, tearing half of the problem away, half of the problem away. And even though it's hard to see in a room like this, it stands to reason that when all of you were standing up, we took a big bite out of the first problem and half of you sat down, half of you sat down, half of you sat down, and theoretically there would have been, if you were closer in in uh space, one single person with the final count. So let's see if we can't analyze this just a little bit by considering what we did. So here's that same algorithm here. Recall is how we motivated week zero's demonstration of the phone book in either digital form as you might see in an iPhone or Android device looking for someone for instance like John Harvard who might be at the beginning middle or end of said phone book but we analyze that algorithm just as we can now this one. So in my very first verbalized algorithm 1 2 3 4 you could draw that as a straight line because the relationship between the number of people in the room and the amount of time it takes is linear. It's a straight line with each additional person in the room. It takes me one more step. So if you think to sort of high school math, there's sort of a slope of one there. And so this n number denoting number of people in the room is indeed a straight line. And on the x-axis, as in week zero, we have the size of the problem in people and the time to solve in steps or seconds or whatever your unit of measure is. If and when I started counting two at a time, 2 4 6 8 10 and so forth, that still is a straight line because I'm taking two bytes consistently out of the problem until maybe the very end where there's just one person left, but it's still a straight line, but it's strictly faster. No matter the size of the problem, if you sort of draw a line vertically, you'll see that you hit the yellow line well before you hit the red line because it's moving essentially twice as fast. But that third and final algorithm, even though in reality it felt like it took a while and I had to kind of bring us to the exciting conclusion by doing some of the math, that looked much more like our third and final phone book example. Because if you think about it from an opposite perspective, suppose there were twice as many people in the room. Well, it would have taken you all theoretically just one more step. Now, granted, one more loop and there might be some substeps in there, if you will, but it's really just fundamentally one more step. If the number of people in the room quadrupled, four times as many people, well, that's two more steps. Equivalently, the amount of time it takes to solve the attendance problem using that third infogal algorithm grows very slowly because it takes a huge number of more people in the room before you even begin to feel the impacts of that uh growth. And so today indeed, as we talk about not only the correctness of algorithms, we're going to talk about the design of algorithms as well. just as we have code because the smarter you are with your design the more efficient your algorithms ultimately are going to be and the slower their cost is going to grow and by cost I mean time like here maybe it's money maybe it's the amount of storage space that you need any limited resource is something that we can ultimately measure and we're not going to do it very precisely indeed we're going to use some broad strokes and some standard mechanisms for describing ultimately the running time the amount of time it takes for an algorithm or in turn code to actually run. So, how can we do this? Well, last week recall we set the stage uh for talking about something called arrays, which were the simplest of data structures inside of a computer where you just take the memory in your computer and you break it up into chunks and you can store a bunch of integers, a bunch of strings, whatever, back to back to back to back. And that's the key characteristic for an array. It is a chunk of memory wherein all of the values therein are back to back to back. So, right next to each other in memory. So we drew this fairly abstractly by drawing a grid like this and I said well maybe this is bte zero and this is bte 1 billion whatever the total number amount of memory is that you have. We zoomed in and looked at a little something like this a canvas of memory. We talked about what and where you can put things. But today let's just assume that we want 1 2 3 4 5 6 seven chunks of memory for the moment. And inside of them we might put something like these numbers here. Well, the interesting thing about computers is that even though if I were to ask you all, find the number 50 in this array. I mean, our minds quickly see where it is because we sort of have this bird's eye view of the whole screen and it's obvious where 50 is. But the catch with computers and with code that we write is that really these arrays, these chunks of memory are equivalent to a whole bunch of closed doors. And the computer can't just have this bird's eye view of everything. If the computer wants to see what value is at a certain location, it has to do the metaphorical equivalent of going to that location, opening the door and looking, then closing it and moving on to the next. That is to say, a computer can only look at or access one value at a time. Now, that's in the simplest form. You can build fancier computers that theoretically can do more than that, but all the code we write generally is going to assume that model. You can't just see everything at once. You have to go to each location in these here lockers, if you will. Starting today two when we talk about the locations in memory we're going to use our old uh zero indexing uh vernacular that is to say we start counting from zero instead of one. So this will be locker zero locker one locker two dot dot dot all the way up to locker six. So just ingrain in your mind that if you hear something like location six that's actually implying that there's at least seven total locations because we started counting at zero. So that's intentional. Um we don't have in the real world yellow lockers. So, we're going to make this metaphor red instead. We do have these lockers here. And suppose that within these seven lockers physically on stage. We've put a whole bunch of money, uh, monopoly money, if you will, but the goal initially here is going to be to search for some specific denomination of interest and use these physical lockers as a metaphor for what your computer's going to do and what your code ultimately is going to do. If we're searching for the solution to a problem like this, the input to the problem at hand is seven lockers, all of whose doors are metaphorically closed. The output of which we want to be a bull. True or false answer. Yes or no? That number is there or no it is not. So inside of this black box today is going to be the first of our algorithm step-by-step instructions for solving some problem where the problem here is to find among all of these dollar bills specifically the $50 bill. If we could get two volunteers to come on up who are ideally really good at monopoly. Okay. How about over here in front? And uh how about let me look a little farther in back. Okay. Over here there and back. Come on down. All right. As these uh volunteers kindly come down to the stage, we're going to ask them in turn to search for specifically the $50 bill that we've hidden in advance. And if uh my colleague Kelly could come on up too because we're going to do this twice. Once searching uh in one with one algorithm and a second time with another. Uh let me go ahead and say hello if you'd like to introduce yourselves to the group. >> Hey, I'm Jose Garcia. >> Hi, I'm Caitlyn Cow. >> All right, Jose and Caitlyn. Nice to meet you both. Come on over and let me go ahead and propose that Jose um the first algorithm that I'd like you to do is to find the number 50. And let's keep it simple. Just start from the left and work your way to the right. And with each time you open the door, stand over to the side so people can see what's inside and just hold the dollar amount up for the world to see. All right, the floor is yours. Find us the $50 bill. 20. >> Shut it. >> No, that's good. That's good acting, too. Thank you. No, you can shut it just like the computer. All right. No. Very clear. Thank you. Still no. $10 bill. Next locker. $5 bill. Not going well. Uh $100 bill, but not the one we want. This one. H $1 bill. Still no 50. Of course, you've been sort of set up to fail, but here, amazing. A round of applause. Jose found the $50 bill. [applause] All right. So, let me ask you, Jose, you found the $50 bill. Um, it clearly took you a long time. Just describe in your own words, what was your algorithm, even though I nudged you along. >> Yeah. So, my algorithm was basically walk up to the first door available, open it, check if the dollar bill was the dollar bill that I was looking for, and then put it back, and then go to the next one. >> Okay. So, it's very reasonable because if the $50 bill were there, Jose was absolutely going to find it eventually, if slowly. In the meantime, Kelly's going to kindly reshuffle the numbers behind these doors here. And even though Jose took a long time here, I mean, what if Jose like wouldn't have been smart to start from the other end instead, do you think? >> Um, not necessarily because we don't know if the 50 is going to be at that end. >> Exactly. So, he could have gotten lucky if he sort of flaunted my advice and didn't start on the left, but instead started on the right. Boom. he would have solved this in one step, but in general that's not really going to work out. Maybe half the time it will. You'll get lucky, half the time it won't. But that's not really a fundamental change in the algorithm whether you go left to right, right to left. To Jose's point, if you don't know anything priori about the numbers, the best you can probably do is just go through linearly left to right or right to left. So long as you're consistent. Now, could you have jumped around randomly? >> Uh, I guess I could have, but if again, if they weren't in any like specified order, I don't think it would have helped either. Yeah. So, in additionally, if he just jumped around to random order, they might get lucky and it might be in the very first one might have taken fewer steps ultimately, but presumably you're going to have to then keep track of like which locker doors have you opened. So, that's going to take some memory or space, not a big deal with seven lockers. But if it's 70 lockers, 700 lockers, even random probably isn't going to be the best job. So, let me go ahead and take the mic away and hand it over to Caitlyn. You can stay on the stage with us. Caitlyn, what I'd like you to do is approach this a little more intelligently by dividing and conquering the problem, but we're going to give you an advantage over Jose. Kelly has kindly sorted the numbers from smallest to largest from left to right. >> So, accordingly, what's your strategy going to be? >> Start in the middle. >> Okay, please. And go ahead as before and reveal to the audience what you found. Not the 50, the 20. But what do you know, Caitlyn? At this point, >> it'll be in on the left is left. Correct. So the 20 is going to be to the left. So where might you go next with this three locker problem? Let me propose that you maybe go to the middle of the three. >> There we go. The middle of the middle. Like that would have been good. But let's >> Oh no. >> Oh no. It's a 100 instead. You failed. But what do you now know? >> It's in the middle. >> That I should have just let you. But now we have a big round of applause for Kayn for having found the 50 as well. [applause] Okay. So, the one catch with this particular demo is that because they know presumably what monopoly money denominations are because we just did this exercise and we had the whole cheat sheet on the board, you probably had some intuition as to like where the 50 was going to be. even though I was trying to get you to play along. But in the general case, if you don't know what the numbers are and that they're the specific denominations, but you do know that they're going from smallest to largest, going to the middle, then the middle of the middle, then the middle of the middle again and again would have the effect of starting with a big problem and having it, having it, having it, just like the phone book as well. So, thanks to you both. We have these wonderful parting gifts that we found in Harvard Square. Uh, if you like Monopoly, you'll love the Cambridge edition filled with Harvard Square name spots. So, but thank you to you both and a round of applause for our volunteers here. [applause] >> All right. So, let's see if we can't formalize a little bit these two algorithms known as linear search in so far as Jose was searching essentially along a line left to right and binary search by implying two because we were having that problem in two again and again and again. So for instance with linear search from left to right or equivalently right to left we could document our pseudo code as follows. For each door from left to right if the 50 is behind the door well then we're done. Just return true. That's the boolean value which was the goal of this exercise to say yes here is the 50. Otherwise at the very bottom of this pseudo code we could just say return false. Because if you get all the way through the lockers and you have never once declared true by finding the 50, you might as well default at the very end to saying false. I did not find it. But notice here, just like in week zero when we talked about pseudo code for searching the phone book, my indentation of all things is actually very intentional. This version of this code would be wrong if I instead used our old friend if else and made this conditional decision. Why is this code now in red wrong in terms of correctness? Yeah, if it's not behind the first door, it'll return false. >> Exactly. Because if the number 50 is not behind the first door, the else is telling you right then and there, return false. But as we've seen in CC code, whenever you return a value, like that's it for the function. It is done doing its work. And so if you return false right away, not having looked at the other six lockers, you may very well get the answer wrong. So the first version of the code where there wasn't an else but rather this implicit line of code at the very or this explicit line of code at the very end that just says if you reach this line of code return false that addresses that problem and to be clear even though it's right after an indented return true when you return a value as in C that's it like execution stops at that point at least for the function or in this case the pseudo code in question. All right, so here's a more computer sciency way of describing the same algorithm. And even though it starts to look a little more arcane, the reality is when you start using variables and sort of standard notation, you can actually express yourself much more clearly and precisely, even though it might take a little bit of practice to get used to. Here is how a computer scientist would express that exact same idea. Instead of saying for each door from left to right, we might throw some numbers on the table. So for i a variable apparently from the value zero on up through the value n minus one is what this shorthand notation means if 50 is behind doors bracket i so to speak. So now I'm sort of treating the notion of doors as an array using our notation from last week. If 50 is behind doors bracket I return true. Otherwise if you get through the entirety of that array of doors you can still return false. Now notice here n minus one seems a little weird because aren't there n doors? Why do I want to go from 0 to n minus one instead of 0 to n? Yeah, >> because zero is the first block. >> Exactly. If you start counting at zero and you have n elements, the last one is going to be addressed as n minus one, not n because if it were n, then you actually have n + one elements, which is not what we're talking about. So again, just a standard notation and it's a little turser this way. it's a little more succinct and frankly it's a little more adaptable to code. And so what you're going to find is that as our problem sets and programming challenges that we assign sort of get a little more involved, it's often helpful to write out pseudo code like this using an amalgam of English and C and eventually Python code because then it's way easier after to just translate your pseudo code into actual code if you're operating at this level of detail. All right. So, in the second algorithm, uh, where Caitlyn kindly searched for 50 again, but Kelly gave her the advantage of sorting the numbers in advance. Now, she doesn't have to just resort to brute force, so to speak, trying all possible doors from left to right. She can be a little more intelligent about it and pick and choose the locker she opens. And so, with binary search, as we call that, we could implement the same pseudo code. We could implement pseudo code for it as follows. We might say if 50 is behind the middle door, then go ahead and return true. Else if it's not behind the middle door, but 50 is less than that number behind the middle door, we want to go and search the left half. So that didn't happen in Caitlyn's sense because we ended up going right. So that's just another branch here. Else 50 is greater than what was at the middle door. We want to search the right half. But there's going to be one other condition here that we should probably consider, which is what is it here? Is it to the left? Or is it to the right? But there's another a corner case that we'd better keep track of. What else could happen? >> If it's not in the array or really like we're out of doors, so we can implement this in a different way. I left myself some space at the top because I shouldn't do any of this if there are no doors to search for. So, I should have this sort of sanity check whereby if there's no doors left or no doors to begin with, let's just immediately return false. And why is that? Well, notice that when I say search left half and search right half, this is implicitly telling me just do this again. Just do this again, but with fewer and fewer doors. And this is a technique for solving problems and implementing algorithms that we're going to end today's discussion on because what seems very colloquial and very straightforward. Okay, search the left half, search the right half is actually a very powerful programming technique that's going to enable us to write more elegant code, sometimes less code to solve problems such as this. And more on that in just a little bit. But how can we now formalize this using some of our array notation? Well, it looks a little more complicated, but it isn't really. Instead of asking questions in English alone, I might say if 50 is behind doors bracket middle, this pseudo code presupposes that I did some math and figured out what the numeric address, the numeric index is of the middle element. And how can I do that? Well, if I've got seven doors and I divide by two, what's that? 7id two, three and a half. Three and a half makes no sense if I'm using integers to address this. So maybe we just round down. So three. So that would be locker number 0 1 2 3 which indeed if you look at the seven lockers is in fact the middle. So this is to say using some relatively simple arithmetic I can figure out what the address is the index is of the middle door if I know how many there are and I divide by two and round down. Meanwhile, if I don't find 50 behind the middle door, let's ask the question. If 50 is less than the value at the middle door, then let's search not the left half per se in the general sense. More specifically, search doors bracket zero through doors bracket middle minus one. Otherwise, if 50 is greater than the value at the middle door, go ahead and search doors bracket middle + one through doors bracket n minus one. Now let's consider these in turn. So searching the left half as we described this earlier seems to line up with this idea like s start searching from doors bracket zero the very first one. But why are we searching doors bracket middle minus one instead of doors bracket middle. Yeah >> middle. >> Yeah exactly. We already checked the middle door by asking this previous question. And so you're just wasting everyone's time if you divide the half and still consider that door as checkable again. And same thing here. We check middle plus one through the end of the lockers array because we already checked the middle one. So same reason even though it just kind of complicates the look of the math, but it's really just using variables and arithmetic to describe the locations of these same lockers. But let's consider now what we mean by running time. The amount of time it takes for an algorithm to run. and consider which and why one of these algorithms is better than the other. So in general when talking about running time we can actually use pictures like this. This is not going to be some like very low-level mathematical analysis where we count up lots of values. It's going to be broad strokes so that we can communicate to colleagues uh to other humans generally whether an algorithm is better than another and how you might compare the two. So here for instance is a pictorial analysis of two different algorithms. It's the phone book from week zero and then the attendance taking from today itself. And let's generally as we've done before sort of label these things. So the very first algorithm took n steps in the very worst case if I had to search the whole phone book or if I had to count everyone in the room. So the first algorithm took indeed n steps. The second algorithm took half as many plus one maybe but we'll keep it simple. So we'll call that n /2. And the third and final algorithm both in week zero with the phone book and today with attendance is technically log base 2 of n. And if you're a little rusty in your logarithms, that's fine. Just take on faith that log base 2 alludes to taking a problem of size n and dividing it in half and half and half as many times as you can until you're left with one person standing or one page in the phone book. That's how many times you can divide in half a problem of size n. Well, it turns out that we're getting a little more detailed than most computer scientists t care to get uh when describing the efficiency of algorithms. So in fact we're going to start to use some not common notation instead of worrying precisely mathematically about how many steps today's and the future's algorithms take. We're going to talk in broader strokes about how many steps they are on the order of and we're going to use what's called big O notation which literally is like a big O and then some parenthesis and you pronounce it big O of such and such. So the first algorithm seems to be in big O of N which means uh it's on the order of N steps give or take some. this algorithm here, you might be inclined to do something similar. Ah, it's on the order of n / two steps and ah, this one's on the order of log base 2 of n steps. But it turns out what we really care about with algorithms is how the time grows as the problem itself grows in size. So the bigger n gets, the more concerned we are over how efficient our algorithm is. if only because today's computers are so darn fast. Whether you're crunching a thousand numbers or 2,000 numbers, like it's going to take like a split second no matter what. But if you're crunching a thousand numbers versus a million numbers versus a billion numbers, like that's where things start to actually be noticeable by us humans and we really start to care about these values. So in general, when using big O notation like this, you ignore lower order terms or equivalently, you only worry about the dominant term in whatever mathematical expression is in question. So big O of N remains big O of N. Big O of N / two. Eh, it's the same thing really as like big O N. Like it's not really, but they're both linear in nature. One grows at this rate, one grows at this rate instead. But it's for all intents and purposes the same. They're both growing at a constant rate. This one too, ah, it's on the order of log of n where the base is who cares. In short, what does this really mean? Well, imagine in your mind's eye that we were about to zoom out on this graph such that instead of going from 0 to like a million, maybe now the x-axis is 0 to a billion. And same thing for the y-axis, 0 to a million. Let's zoom out. So, you're seeing 0 to a billion. Well, in your mind's eye, you might imagine that as you zoom out, essentially things just get more and more compressed visually because you're zooming out and out and out, but these things still look like straight lines. This thing still looks like curved lines, which is to say as n gets large, clearly this green algorithm, whatever it is, is more appealing it would seem, than either of these two algorithms. And if we keep zooming out, like at some point, the ink is going to be so close together that they all for are for all intents and purposes pretty much the same algorithm. So this is to say computer scientists don't care about lower order terms like divide by two or base 2 or anything like that. We look at the most dominant term that really matters as n gets bigger and bigger. So that then is bigo notation and it's something we'll start to use pretty much recurringly anytime we analyze or speak to how good or how bad some algorithm is. So here's a little cheat sheet of common running times. So for instance here's our friend big O of N which means uh the algorithm takes on the order of n steps. Uh here is one that takes on the order of login steps. Here are some others we haven't seen yet. Some algorithms take n times log n steps. Some algorithms take n squared steps and some algorithms just take one step maybe or maybe two steps or four steps or 10 but a constant number of steps. So let me ask of the algorithms we've looked at thus far for instance linear search being the very first today what is the running time of linear search in big O notation that is to say if there's n people uh if there's n lockers on the stage how many steps might it take us to find a number among those n lockers big O of yeah >> big O of N in fact is exactly where I would put linear search. Why? Well, if you're using linear search in the very worst case, for instance, the number you're looking for, as with Jose, might be all the way at the end. So, you might get lucky. It might not be at the very end, but generally, it's useful to use this bigo notation in the context of worst case scenarios because that really gives you a sense of how badly this algorithm could perform if you just get really unlucky with your data set. So e even though big O really just refers to an upper bound like how many steps might it take it's generally useful to think about it in the context of like the worst case scenario like ah the number I care about is actually way over here but what about binary search even in the worst case so long as the data is sorted how many steps might binary search take by contrast >> big O of log N so binary search we're going to put here which is to say that in general and especially as n gets large binary search is much faster it takes much less time. Why? Because assuming the numbers are sorted, you will be dividing in half and half and half just like with the phone book in week zero that problem and you will get to your solution much faster. Why should you not use binary search though on an unsorted array of lockers like a random set of numbers? Yeah, >> you could just get rid of the value because you don't know like what the inequality is going to be. >> Exactly. You're making these decisions based on inequalities, less than or greater than, but based on like no rhyme or reason. You're going left, going right, but there's no reason to believe that smaller numbers are this way and bigger numbers are that way. So, you're just making incorrect decision after incorrect decision. So, you're probably going to miss the number altogether. So, binary search on an unsorted array is just incorrect. Incorrect usage of the algorithm. But, like Kelly did, if you sort the data in advance or you're handed sorted data, well, then you can in fact apply binary search perfectly and much more efficiently. >> I have a question. Is there ever a case where linear search is more efficient just because the process of sorting the data yourself? >> Absolutely. Is linear search sometimes more efficient if it's going to take you more time to sort the data and then use binary search? Absolutely. And that's going to be one of the design decisions that underlies any implementation of an algorithm because if it's going to take you some crazy long time not to sort like seven numbers but 70 700 7,000 7 million but you only need to search the data once then what the heck are you doing? Like why are you wasting time sorting the data if you only care about getting an answer once? You might as well just use linear search or heck do it even randomly and hope you get lucky if you don't care about reproducing the same result. Now in general that's not how much of the world works. For instance, Google's working really hard to make faster and faster algorithms because we are not searching Google once and then never again doing it. we're doing it again and again and again. So they can amortize, so to speak, the cost of sorting data over lots and lots of searches. But sometimes it's going to be the opposite. And I think back to graduate school where I was often writing code to analyze large sets of data. And I could have done it the right way, sort of the CS50 way by fine-tuning my algorithm and thinking really hard about my code. But honestly, sometimes it was easier to just write really bad but correct code, go to sleep for seven hours, and then my computer would have the answer by morning. The downside, as admittedly happened more than once, is if you have a bug in your code and you go to sleep and then seven hours later you find out that there was a bug, you've just wasted the entire evening. So there too, a trade-off sometimes when making those resource decisions. But that's entirely what today is about, making informed decisions. And sometimes maybe it's smarter and wiser to make the more expensive decision, but not unknowingly, at least knowingly. All right, so there might we have our first two algorithms, but let's consider another way of describing the efficiency of an algorithm. Big O is an upper bound. Sort of how bad can it get in these uh cases where maybe the data is really uh not working to our advantage. Omega, a capital omega symbol here is used for lower bounds. So maybe how lucky might we get in the best case, if you will. How few steps might an algorithm take? Well, in this case here, here's just a cheat sheet of common runtimes, even though there's an infinite number of others, but we'll generally focus on uh um u functions like these. Let's consider those same algorithms. So with linear search from left to right, how few steps might that algorithm take? For instance, in like the best case scenario? Yeah. Is this hand about to go up? >> Yeah. So one step. Why? Because maybe Jose could have gotten lucky and opened this door and voila, that was the 50. It didn't play out that way, but it could have. In the general case, the number you're looking for could very well be at the beginning. So we're going to put linear search at omega of one. So one step and maybe it's technically a few more than that, but it's a fixed number of steps that has nothing to do with the number of lockers. Case in point, if I gave you not seven but 70 lockers, he could still get lucky and still take just one step. So omega is our lower bound. Big O is our upper bound. Ah, spoiler. What is binary search's lower bound? Well, apparently it's also omega of one. But why? That is in fact correct. Yeah, >> you could just get lucky again. >> Same reason you could get lucky in the best case and it's just smack dab in the middle of all of the data. So the fewest number of steps binary search might take is also actually one. So this is why we talk about upper bound and lower bound because you get kind of a r a sense of the range of performance. Sometimes it's going to be super fast which is great but something tells me in the general case we're not going to get lucky every time we use an algorithm. So it's probably going to be closer to those upper bounds the big O. Now, as an aside, there's a third and final uh symbol that we use in computer science to describe algorithms. That of a capital theta. Capital theta is jargon you can use when big O and omega happen to be the same. And we'll see that today. Not always, but here's a similar cheat sheet. None of the algorithms thus far can be described in this way with theta notation because they are not all the same with their big O and omega. They differed in both of our analyses. But we'll see at least one example of one where it's like okay we can describe this in theta and that's like saying twice as much information with your words to another computer scientist rather than giving them both the upper and the lower bounds. The fancy way of describing all of what we're talking about here big O omega and theta is asmmptoic notation. And asmtoic notation refer or asmtoic uh lee refers to a value getting bigger and bigger and bigger and bigger but not necessarily ever hitting some boundary as n gets very large in short is what we mean when we deploy this here asmtoic notation. All right. So, with the first of these things like linear search, let's actually kind of make this a bit more real. Let me actually go over to in just a moment uh my other screen here. Okay, in VS Code, let me go ahead and create a program called search.c. And in search C, let's go ahead and implement a fairly simple version of linear search initially. So, let me go ahead and include, for instance, cs50.h. Let me go ahead and include standard io.h. Then, let me go ahead and do in main void. So, we're not going to bother with any command line arguments for now. And then let me go ahead and just give myself an array of numbers to play with. And we did this briefly last week in answer to a question, but I'm going to do it now concretely rather than use something uh ma more manual to get all of these numbers into the array. I'm going to say give me an array called numbers. And the numbers I want to put in this array initially are going to be the exact same denominations we've been playing with. 20 500 10 5 100 1 and 50. Again, this is notation that I alluded to in answer to a question last week whereby if you want to statically initialize an array, that is give it all of your values up front without having the human type them all in manually, you can use curly braces like this. And the compiler is pretty smart. You don't have to bother telling the compiler how many numbers you want, 1 2 3 4 5 6 7 because it can obviously just count how many numbers are in the curly braces, but you could explicitly say seven there so long as your counting is in fact correct. So on line six, this gives me an array of seven numbers initialized to precisely that list of numbers from left to right. All right, let's ask the human now what number they want to search for just as I did our two volunteers and say int n equals get int. Then let's just ask the user for the number that they want to search for. Then let's implement linear search. And if I want to implement linear search in terms of the programming constructs we've seen thus far like what type what uh keyword in C should I use? What programming technique? Yeah. Yeah. So, maybe a for loop or a while loop, but for loop is kind of uh my go-to lately. So, let's do four int i equals zero because we'll start counting from the left. I is less than seven, which isn't great to hardcode, but I'm not going to use the seven again. So, I think it's okay in one place for this demo. then I ++ then inside of this array let's go ahead and ask a question just like Jose was by opening each of the doors by saying if numbers bracket I equals equals the number we asked about n well then let's go ahead and print out some informative message like found back slashn and then for good measure like last week let's return zero to signify success it's sort of equivalent to returning true but in main recall you have to return an int. That's why we revealed at the end of week two the return type of main is an int because that is what gives the computer its so-called exit status which is zero if all is well or anything other than zero if something went wrong but I think finding the number counts as all is well but if we get through that whole loop and we still haven't printed found or return zero I think we can go ahead and safely say not found back slashn and then let's just return one as our exit status to indicate that we didn't find the actual number. So in short I think and see this is linear search. Let me open up my terminal window again. Let me make search enter. Let me do / search enter. And I'll search for as I asked Jose the number 50. And we indeed found it at the end. Let me go ahead and rerun dot slash search. And let's search for the other number at the beginning 20. That then works. And just to get crazy, let's search for a number we know not to be there like a th00and. And that in fact is not found. So I think we have an implementation then of linear search. But let me pause here and ask if there's any questions with this here code and the translation of algorithm to see. Yeah, in the back why I did not specify the length of the array. So it is not necessary when declaring an array and setting it equal to some known values in advance to specify in the square brackets how many you have because like the compiler is not an idiot. It can literally count the numbers inside of the curly braces and just infer that value. You could put it there, but arguably you're opening up the possibility that you're going to miscount and you're going to put seven here but eight numbers over there or six numbers there. So it's best not to tempt fate and just let the compiler do its thing instead. A good question. Other questions on this code so far? All right, if none, let's go ahead and maybe convert this linear search to one that's maybe a little more interesting that involves like searching for strings of text. After all, we started the class in week zero by searching for names in a phone book like John Harvard. Let's see if we can't adapt our code for searching for strings instead of integers. So, in my code here, let's go ahead and delete everything inside of main just to give myself a clean canvas. Let me go ahead and give me another array. This one called, let's just call it strings, cuz that's the goal of this exercise. And set them equal to some familiar pieces from the game of Monopoly if you might have played. So, there's like a battleship piece in there, there's a boot in there, there's a cannon in there, an iron, a thimble, and a top hat. Though, it does vary nowadays based on the addition that you have. So kind of a long array, but I have 1 2 3 4 5 six total values in this array of strings. Now let's ask the user for a string. We'll call it s for short. And say with get string, what string are you looking for among those six? Then I think we can do an a for loop again for int i= 0 i less than 6 i ++. And then inside of this loop, let's do the same thing. If uh let's say uh strings bracket i equals equals the string s that the human typed in. I think we can go ahead and say print found back slashn and then as before return zero to signify success. And if we don't after that whole for loop let's print print f not found back slashn down here and return one to signify error. So, it's really the same thing at the moment, except that I'm actually using strings instead of integers. All right, let me go ahead and open up my terminal window again and clear it. Let me go ahead and recompile this code. Make search.c seems to compile. Okay, let me do dot / search and let's go ahead and search for the first one. How about battleship enter? Huh, not found. All right. Well, let's maybe typo. Maybe let me search for something easier to spell. boot not found. That's weird. Both of those are at the very start of the array. Let's do dot slarch again and search for top hat. Enter. Not found. What is going on? Well, this isn't actually that obvious as to what I'm doing wrong. But it turns out that when we actually compare strings instead of integers in C, we're actually going to have to use this other library, at least today, that we saw briefly last week. Last week we introduced it because of a function called sterling which gives us the length of a string. Turns out that string.h also comes per its documentation with another useful function called stir comp for string compare and its purpose in life is to actually compare two strings left and right to make sure they are in fact the same. So for today's purposes suffice it to say you cannot use equals equals apparently to compare two strings intuitively. Why is that? Well, for a computer, it's super easy to compare two integers because they're either there or they're not in memory. But with a string, it's not just a character and another character. It's like seven a few characters over here and a few characters over here. Maybe it's a few, maybe it's more. You have to compare each and every character in a string to make sure they're in fact the same. So, stir compare does exactly that. probably in the implementation of stir comp from like years ago someone wrote a while loop or a for loop that looks at each string left to right and compares each and every one of the characters therein and then gives us back an answer. So how do we go about using this? Well to use stir compare what I can actually do in VS code here is go and change my code as follows. Instead of using equals equals I'm going to actually use this function per its documentation. I'm going to call stir compare. Then I'm going to pass in one of the strings which is in strings bracket I. Then I'm going to pass in the second string which is S. However, having read the documentation and this is a little non-obvious. It turns out that stir comp will return zero if the strings are equal. Otherwise, it's going to return a positive number or a negative number. So what I care about for now is does the return value of stir comp when given those two strings give me back zero. If so, they are equal and I'm going to say quote unquote found. So, let's go ahead and open the terminal again. Let me go ahead and clear it and do make search to recompile my code. And huh, I've done something wrong. Let's see. Let me scroll up to the very first line. In line 11, error call to undeclared library function stir comp with type in and something something which gets more complicated after that. Why is line 11 not working despite what I just preached? Yeah. >> Yeah. I just did something stupid. I didn't include the string.h header library. So all clang, our compiler, is doing when invoked by make is it's encountering literally the word stir comp and not knowing what it is because we haven't taught it what it is by simply saying include string.h at the top. Okay, let me reopen my terminal window. Clear that message away. Do make search again. Now it's compiling. Dot / search. Enter. Now I'm going to go ahead and search as I did before for battleship. Ah, now it's finding it. Let me run dot slash search again. Search for boot. Ah, okay, that's found. Let me go ahead and search for top hat. That too is in there. Let me go ahead and search for something that's not there, like the number 50. Not in fact found. So I think we've actually fixed that there problem. But if we go back to this code for a moment, it's indeed the case per the documentation that equals equals 0 is what I want to do. Why in the world would stir comp be designed to return a positive or a negative number too? It's not returning true or false. It's returning one of three possible values. Zero, negative, or positive. Why might it be useful? Yeah. >> Um you could kind of like compare which of the strings is like greater. >> Yeah, super clever. So, if you're passing in two strings, it's great to know if they're equal. But wouldn't it be nice if this same function could also help us sort these strings ultimately and tell me which one comes first alphabetically. And technically, it's not going to be alphabetically. It's going to be a cute phrase asetically because it's actually going to look at the asky values of the characters and do some quick arithmetic and tell you which one comes first and which one comes later, which is enough as we'll eventually see for actually sorting these strings as well. So in short, the documentation will tell me that I should check not only for zero if I care about equality, but if I care about inequality, that is checking if one comes first or last, I should check whether something is less than zero or greater than. But for this demonstration implementing linear search, I don't care about comparing them uh for inequality. All I care about is that they are in fact the same or not in this case. All right. All right. Well, let's go ahead and do one other example of sort of linear search, but let's make the problem more like that actually in week zero of searching a phone book. So, let me go back to VS Code here. Close search.c and let's make an actual phone book. So, I'm going to say code of phonebook C. And then inside of phonebook C, let's use our same header file. So, include CS50.h, include standard io.h, and let's include an advanced string.h. Then let's before as before do int main void. No command line arguments today. Then inside of here, let me give myself first an array of strings. How about some names in the phone book? So I'm going to say string names equals and then three names just to make uh a demonstration. Kelly and David and say John Harvard here. But if it's a phone book, I need more than just names. So let me go ahead and give myself another array. String of numbers open bracket close bracket equals. And now the same phone numbers we used in week zero for the three of us. Uh + 1 617 495 1. Uh same for both Kelly and me. So plus1 617495 uh 1. And then as before, if you'd like to text or call John directly, you can do so at plus1 9494682750 and semicolon. So one question first. I obviously declared our names to be a an array of strings because that's what text is. Why have I also declared phone numbers to be strings and not integers? Because a phone number is like literally a number in the name of it. Yeah. >> Yeah. So even though we have phone numbers in the US, even though we have social security numbers and a bunch of other things that we call numbers, if you have other non-digits in those uh in those values, you have to actually use strings because if it's not an actual integer, but it does have things like pluses or dashes or parentheses or any other form of punctuation as is common in the US and other countries for phone numbers in particular, you're going to actually want to use strings and not numbers. as well as for corner cases like if there are if you're in the habit back home if you're not from uh say the US and you actually have to dial zero first to make like a local regional call you don't want to have a leading zero in a integer because mathematically as we know from grade school like leading zeros number zeros that come first have no mathematical meaning they're going to disappear effectively from the computer's memory unless we store them in fact as characters in strings in this way okay with that said let's go ahead and ask the human now after having declared those two arrays for the name they want to look up the number of. So let's say string name equals get string and let's go ahead and ask the human uh for the name for which to search. Then let's use a for loop as before for int i equals z i less than 3 which again for demonstration purposes I'm just hard coding today i ++ and then in the for loop I'm going to use our new friend stir comp. If the return value of stir compare passing in names bracket I and the name the human typed in equals equals zero signifying that they are in fact the same. Well that means we found the location i where the person's name is. So let's go ahead and print out found. But just to be fun let's print out whom we found. So percent s back slashn and then output there the number which is going to be in the corresponding numbers array at that same location I will return zero and at the very end of this program let's go ahead and print out not found if we get that far and return one. All right. So, a little more complexity this time, but notice I'm comparing the names just like a normal person would in your iOS app or your Android app when looking for someone's name. But what I care about is getting back the number. So, that's why two lines later, I'm printing out the number that I found at location I, not the name because I already know the name. All right. In my terminal window, let's go ahead and make this phone book dot /phonebook. Let's go ahead and search for John, whose number is hopefully indeed exactly that number. So, suffice it to say, this code two does work. This is a linear search because I'm searching left to right. These aren't actually sorted alphabetically by name or let alone number. So, I think we're doing well here, but I don't necessarily love this implementation. Even if you're new to programming, what might you not like about how I've implemented a phone book in the computer's memory? Why is this maybe not the best design? Yeah. >> Like there's a correspondence between names and numbers. So like having two different >> Okay. Yeah. And I would say so uh you're pointing out that we have this duality. We've got two arrays. They're the exact same length. And it just so happens that location zero's name lines up with location zero's number and location one and location two. But we're kind of on the honor system here whereby the onus is on us to make sure we don't screw this up and we make sure we always have the same number of names and the same number of numbers and better and moreover that we make get the order exactly right. We are just trusting that when we print out the e number so to speak that it lines up with the e name. So that's fine and honestly for three people who really cares it's fine. But if you think about 30 people, 300, 3 million, well, we're not going to hardcode them all here, but even in some database that we'll store them in later in the course feels like just trusting that we're not going to screw this up is asking for trouble. And indeed, a lot of programming is just that, like not trusting yourself and definitely not trusting your colleague not to mess something up, but programming a bit more defensively and trying to encapsulate related information a little more tightly together and not just assume as on the honor system that these two independent arrays will line up. But at this point, we have no means of solving this problem unless we give ourselves just a bit new functionality and syntax. So I used this phrase earlier to kick things off. data structures. It's like how you structure your data in the computer's memory. Arrays are the simplest of data structures. They just store data back to back to back from left to right continuously in memory. But they all have to be, as we've seen, the same kinds of values. Int int or string string string. There's no mechanism yet for storing an int and a string together and then another int and another string together or let alone two strings, two strings, two strings that are somehow a little bit different. But it would be nice if C gave us an actual data type to store people in a phone book such that we could create an array called people inside of which is going to be a whole bunch of persons if you will back to back to back and I want two of them. So wouldn't it be nice if I could literally use this code in C. Well decades ago when SE was invented they didn't give us a person data type. All we have is int and float and char and bool and string and so forth. Person was not among the available data types. But we can invent our own data types it turns out. So in C what we can do if we want persons to exist and every person in the world shall have a name and a phone number for now we can do this string name string number. Now that's a decent start but it's going to be kind of a stupid implementation if I then just do name uh string name one string name two string name three string name four. We've already started down that road last week and decided arrays were a better solution. But here's an alternative when you want to just store related data together. I can use these two keywords and see typed defaf strruct which albeit tur just means define a new type that is a data structure. So multiple things together inside the curly braces you literally put the two things you want to relate together string name string number and then outside the curly braces you specify the name you want to give to this brand new custom type that you have invented. Technically, stylistically, you'll see that style 50 prefers that the name actually be on the same line as the last curly brace, which looks a little weird to me, but that's what industry tends to do, so so be it. But these several lines together tell C, invent for me a new data type called person, and assume that every person in the world has a string called name and a string called number. And now I can use this new data type in my own code to solve this problem a little bit better. So, in fact, let me go ahead and do this as follows. I'm going to go back to VS Code here. And at the very top of my code, above main, just to make this available to not only Maine, but maybe any future functions I write, I'm going to say type defrct, as we saw on the screen. Inside of my curly braces, I'm going to say string name and string number. And then I'm going to name this thing person. Now, I'm going to go about using this and I'm going to go ahead and delete my previous honor system approach of having names and numbers in separate arrays. And I'm instead going to give myself an array of people. Uh, we could call it persons, but I'm trying to be somewhat grammatically correct. So, I'm going to say people bracket three to give myself an array called people inside of which is room for three persons inside of which is room for a name and number each. So, how do I now initialize these values? So I'm going to hardcode them. That is type them manually. But you can imagine using get string or get or some other function to get this data from the human themselves. I'm going to say go to the people array at location zero and access the name field. And this is syntax we haven't seen yet, but it's not that hard. You literally use a dot, a single period to say go inside of that structure and access the name field, the name attribute, so to speak. And let's set that equal to Kelly. Then let's go into that same array location people bracket zero and set the number for the zeroth person to be + one 6174951000. Then let's go ahead and do the same thing for people bracket 1. Set that person's name to for instance mine David. Then let's do people bracket 1 number equals quote unquote same as Kelly cuz we're both in the directory. So + 1 617495 1,000. And then lastly, people bracket 2.name equals quote unquote John for John Harvard. People bracket 2 number equals + one uh 949 468 275 0 in this case. And now the rest of the code is almost the same. I'm going to now on the new line 24 still ask the user what name they want. I'm going to still iterate from 0 to three because there's still three elements in this array even though each has two values within. And I'm going to compare now not names but people bracket i.name to go access the name of that i person and compare it to the name that the human has typed in. And when I find that person I'm going to go into the people array at location i but print out the number instead. So all we've done here is add this dot notation which allows you to access the inside of a data structure. And all we've done is introduce up here some new C keywords that let you invent your own data types inside of which you can put most anything you want. I have chosen a string name and a string number. All right, let me go ahead and open my terminal window and clear it from before. Let me do make phone book to make this version. So far so good. Make phone book. Enter. I'm going to go ahead now and search for say John. And I have again found his number. So this is still correct. But even though this took more minutes in terms of the voice over and it took more lines of code, it's arguably better designed now because at people bracket zero is an actual person and everything about them. At people bracket one is another person and everything about them and so forth. This is what we mean by encapsulate. You can think of these curly braces as sort of hugging these data types inside of the data structure together so as to keep them together in the computer's memory as well. All right. Well, just to set the stage, uh, literally as we'll strike the lockers and put something else up, the efficiency of binary search as implemented by Caitlyn was predicated on Kelly having in advance sorted the values up front. But of course, we've only considered now the running time of searching for information using two algorithms, and there can be many others in the real world, but those are two of the most canonical. We found that binary search was faster than linear search, but it required that we sort the data. So to your question earlier, maybe we should consider just how expensive it is in terms of time, money, space, humans to sort data, especially a lot of data, and then decide whether or not it's worth using something like binary search or perhaps even something else. So the next problem we'll solve today ultimately is given a generic input and output. The input to our next problem is going to be unsorted data. So like numbers out of order, the output of which should be sorted data. So for instance, if we pass in 72541603, I want whatever black box is implementing my sorting algorithm to spit out 0 1 2 3 4 5 6 7. So that's going to be the question we answer. But first, I think it's time for some delightful hello pandas, chocolate biscuits. Uh let's take a 10-minute break and snacks are now served. All right, we are back. And recall that the cliffhanger on which we left was that how do we go about sorting numbers? Well, here are some numbers, eight of them in fact, from 0 to seven. but currently unsorted. Um, we don't quite have enough Monopoly boards for everyone, but we do have some delightful uh Super Mario Brothers Pez dispensers. If I could get eight volunteers for this final demo up here. Oh, and not a lot of hands. Okay. All right. One, two, three, four, five, six, and let's go farther back. Seven, and eight. How about All right. Come on up. Hopefully I counted properly. Come on over. Upon arrival at the stage, go ahead and grab your favorite illuminated number and stand in that same order at the front of the stage if you all could. Welcome to the stage. All right, grab your favorite number. Stand in that same order. All right, good. And one, two, three, four, five, six. I definitely said one through eight. Who is the number eight then? Okay, we need an eight. Come on down. All right. Well, technically we need a four, but come on down. Yeah. All right, grab the four and let me start from this end first if you want to give a quick hello and a little something about you. >> Uh, hi, my name is Cameron. I'm a first year and I want to study mechanical engineering. >> Welcome. >> Hi, I'm Charlotte. I'm also first year and I'm in Canada F. >> Welcome. >> Hi, I'm Ella. I'm also a first year and I'm in the >> Hi, I'm Precious. I'm also a first year. I'm there. >> Hi, I'm Michael. I'm just an Eventbrite guest. >> Yeah. >> Hi, I'm Marie. I'm a first year and I'm in Canada. >> Welcome. >> Hi, I'm Rick. I'm a first year and I'm in whole worthy. >> Welcome. >> Nice. >> I'm Jaden. I'm a first year in Hullworthy and I really like free stuff. >> Okay. Well, let's see then uh if we can't award all these Super Mario Brothers Pez dispensers. The first notice, of course, that all eight of our volunteers are completely out of order, but in an ideal world, we would have the smallest number over here. Go over there. Number zero. Wait a minute. Seven. Let's go over here. Two. Okay. F. Okay. Make yourselves look like that. [laughter] No pez. It's okay. All right. So, 725 41603. Okay. We won't do the introductions again, but now we have a list of numbers completely out of order. And wouldn't it be nice if zero were eventually over here, seven were all the way over there, and everything else was sorted from smallest to largest? Well, if you all could go ahead and sort yourselves from smallest to largest. Go. All right. And Jaden, what was your algorithm for doing that? Um I I I I know that I have the least number because I don't think there anybody has a number less than zero. So I put myself at the last bottom line. >> Okay. And I assume Precious. What was your algorithm? >> I knew I had the largest number. So I just had to be at the end of the >> Okay, fair. So you guys got the easy ones. Uh number four. How about >> I knew three was before me and five was after me. >> Nice. So number four didn't actually have to move coincidentally. But as for five and three and two and one and six, they probably had to take into account some additional information. Who's to their left? Who's to their right? And it just kind of worked. But it didn't look very algorithmic, if you will. It looked very organic and obviously correct. But I'm not sure that same approach would work well if we had not eight, but 80 or 800 or 8,000 pieces of data. So let's see if we can't formalize this a little bit. Let me take the mic and if you guys could reset yourselves to those same original positions from seven on the left to three on the right. Let me propose a couple of algorithms, canonical ones if you will, but see if maybe we can't formalize step by step what to do. So the first one I'm going to do given all of these numbers is just try to select the smallest number. Why? To Jaden's point earlier, I just want to put the smallest number over here. At least that's a problem I can solve. It's very well defined. It's a nice bite out of the problem. So seven. Okay, smallest so far. Two, that's that's smaller. So I'm going to remember that two is the now smallest number I've seen. Not five, not four. One is even smaller. So, I'm going to remember one, not six, zero. That's pretty good. But I'm going to check the whole list. Maybe there's negative one or something like that. But no, three. So, I'm going to remember that zero was the smallest element I found. Let's select Jaden and put Jaden over here. But before Precious or anyone else moves, we don't really have room for you. Like, Precious is in the way because if this is an array of eight values for integers, well, we can't just kind of make room over here because if you think back to last week, we might have uh some garbage values there or something else is going on. We don't want to change data that doesn't belong to us. So what to do with precious? Well, maybe Precious, maybe you can go over there. So you just take Jaden's spot and we'll swap these two values accordingly. Now though, Jaden is in the right space, which is good because now I can move on to the second problem. What's the next smallest element that's presumably greater than zero? Well, at the moment, two is the next smallest element. Not five, not four. Ooh, one is the next smallest element. I'm going to remember that. Not six, not seven, not three. Okay, so number one, if you could go to the right location, but I'm afraid we're going to have to evict number two to make room. All right, let's do this again. Zero and one are in good shape. So now I think I can ignore them as complete. Five is the current smallest. Nope. Four now is Nope. Two now is six. No. Seven. No. Three. No. Okay, so two is the next smallest. So let's swap two and five. And now I've solved three out of the eight problems. Let's do this again. Four is at the moment the smallest. Not five, not six, not Oh, three is the now smallest. So, let's swap three. Four and three, which unfortunately is making the four problem a little worse. Like he belongs there, it would seems, but I think we can fix that later. So, now half of the list is sorted. Five is the next smallest. Six and seven. A four. Now, we got to fix the four. So, four goes back there. Now, I messed up the five, but it will come back to that. All right. Six. Seven. Okay. Five. Let's put you where six is. And now one more mistake to fix. So, seven. Okay. Six and seven need to swap. And now I've solved eight problems in the aggregate. So it's complete. Now to be fair, my approach is clearly way slower than your approach, but you all were working in parallel, whereas I was doing it more methodically, step by step. And I dare say my algorithm is probably going to be more translatable to code. And indeed, what I just acted out is what the world would call selection sort, whereby on each iteration, each pass in front of the humans, I was selecting the smallest element I could find. All right. What how else could I do this, though? So, let's do something that's maybe a little more organic like your approach where you were actually comparing who was next to you. Go ahead and reset yourselves one final time to this arrangement. Seven on the left, three on the right. And let me propose again to walk through the list again and again. But let me focus more narrowly on the problem right in front of me because I felt like I was taking a lot of steps back and forth, back and forth. Maybe we can chip away at some of that wasted time. Let's compare seven and two. They're obviously out of order. So, let's just immediately swap you two if we could. All right. Now, seven and five clearly out of order. Let's swap these two. Seven and four out of order. Let's swap these two. Seven and one out of order. Let's swap these two. Seven and six out of order. Let's swap these two. Seven and zero out of order. Swap these two. Seven and three out of order. Swap these two. So, a lot of work for Precious there. But, I've now indeed solved one of the eight problems. Moreover, I don't need to keep uh addressing the seven problem because notice that Precious has essentially bubbled her way up to the end of the list. And indeed, that's going to be the operative term here. Another algorithm that computer scientists everywhere know is called bubble sort, whereby the goal is to get the biggest elements to just bubble their way up to the top of or the end of the list one at a time. Now, am I done? Well, no. Clearly not. There's still stuff out of order except for precious. Indeed, I have solved one of these eight problems. And now fine, I'll go back and I'm just going to try this same logic again. Two and five, good. Five and four, nope, swap those. Five and one, nope, swap those. Five and six are good. 6 and zero, nope, swap those. Six and three, nope, swap those. And I already know that Precious is where she needs to be. So, I think I'm done with the second of eight problems. And I'll do this a little faster now. Two and four. Four and one, swap. Four and five are good. Five and zero, swap. Five and three, swap. And now we solved three problems. Let me reset. Two and one, swap. Two and four are good. Four and zero, swap. Four and three, swap. And now I've solved half of the problems. Four out of eight. We're almost done. One and two are good. Two and zero, swap. Two and three are good. Okay. And now we're done with five out of the eight problems. One and zero swap. Uh, one and two are good. Those are all good. And let me just do a final sanity check. Everything now is sorted. So now I'm done solving all eight of those problems. So, you all were wonderful. We need the numbers back, but Kelly has some delightful Pez dispensers for you on the way out. If you want to head that way, just leave the numbers on the shelves. And a round of applause for our eight volunteers for helping to act this out. [applause] Thank you. So, let's see if we can't formalize what these volunteers kindly just did with us. Starting with the first of those algorithms. Thank you. Namely, selection sort. Let's see if we can't slap some pseudo code on this. thinking of our humans now as more generically an array. So we had the first person at location zero and we had the last person at location n minus one. And just for clarity so that you've kind of seen the uh symbology this obviously is going to be location n minus2. This is location n minus3 and so forth until sort of dot dot dot you hit the other end that we've already written out. So that's just how we would refer to all of our eight volunteers locations or in this case 1 2 3 4 5 6 seven locations but dot dot dot in the middle conoting that this can be a much much larger array. So here's some pseudo code for the first algorithm selection sort for i from zero to n minus one. So from the first element to the last element find the smallest number between the numbers bracket i and numbers bracket n minus one. In other words, if you're starting I at zero, look at specifically every lighted number between location zero and location n minus one. When you have found that smallest element, swap it with the number at location i, which starts again at zero. That's how we got I think jaden into place at the very beginning. Then I by nature of how for loops work gets updated from 0 to one. So that we do the same thing. Find the smallest number between numbers bracket one. So the second element through the eighth element because this number is unchanged. N is the total number of values. So the end point there is not changing. Once we found the second smallest person, we swap them with location I aka one. And that's how we got the number one into position and then the number two and then the number three and number four. So this then was selection sort in pseudo code form. And that allowed us to actually go through this list again and again and again in order to find the next smallest element. So what was happening a little more methodically if it helps just to map that symbology of the bracket notation and the eyes. If this is where we started with location I and we did everything between location N minus one. Essentially I traversed this whole list from left to right literally walking in front of our volunteers looking at each element and the first element I saw was seven. At the moment that was the smallest element I had found. And who knows in a different list maybe seven would be the smallest element. So I kind of stored it in a variable in my mind. But I checked then two and remembered no no two is clearly less than. Now I'm going to remember two. Okay. Now I'm going to remember one when I find it. Then I'm going to remember zero when I find it. And then what I did once I found jade in it with the value of zero uh lighted up. I moved location that location to here and then evicted precious recall and moved precious over to that location that we had freed up. Why? Why all this sort of back and forth? Well, you have to assume with an array that you're not entitled to the memory over here. You're not entitled to the memory over here if you've already decided that you have seven lockers or eight people. You have to commit to the computer in advance. That's why we put the number typically in the square brackets or the compiler infers from the curly brackets how big the array actually is. All right. And suffice it to say when I went through this again and again and again, I did the same thing over and over. Now, you might have thought me sort of dumb for having asked the same questions again and again like I was surprised to discover the number one. I was surprised to discover the number to two even though on my very first pass I literally looked at all eight of those numbers but you have to think about what memory I'm actually using. Now I certainly could have memorized all of the numbers and where they are. But I propose that just very simply I was using like a single variable in my brain just to keep track of the then smallest element. And once I'm done finding that and solving that problem I moved on to do it again and again. But that's going to be a trade-off. And this is going to be thematic in the coming weeks whereby well sure you could use more memory and I could have been smarter about it and maybe that would have improved or um hurt the running time of the algorithm. There's often going to be a trade-off between how much memory or how much time you actually use. So we'll discover that over time. So how fast or slow is selection sort? Well consider when I had eight humans on stage I first went through uh all n of them. But how many comparisons did I make? Really, I was doing n minus one comparisons because if I've got n people, I've got to compare the smallest number I found against everyone else. And you compare n people left to right n minus one times total. So the first pass I was making I was asking n minus one questions. Is this the smallest? Is this the smallest? Is this the smallest? N minus one times. Once I solved one problem, when we got Jaden into Jaden's right place, then I had one fewer problem. Then one fewer fewer problem and so forth. So, it was like n -1 steps plus n -2 steps plus n -3 steps plus dot dot dot one final step once I got to the final of the eight problems. Now, if you remember kind of the cheat sheet at the back of your math books, uh say growing up, you'll note that this uh series here can be more simply written as n * n -1 all / 2. And if you've not seen that before, just take on faith that this is identical to this series of numbers up here. So, now we can just kind of multiply this out. So that's technically n^2 minus n all divided by 2, which is great. If we multiply that out, that's n^ square over 2 - n /2. We're getting too into the weeds. Let's whip out our big O notation now, whereby we can wave our hands at the lower order terms only care about the biggest most dominant term, which mathematically in this expression, if you plug in a really big value of n, which is going to matter more? The n squ, the two, the n, or the two? Like the n squ? like the others absolutely contribute to the total value. But if you plug in a really big value, the dominant force is going to be this n squ because that's really going to blow up the total value. So we can say that selection sort when analyzed in this way, ah it's on the order of n squared steps because I'm doing so many comparisons so many times. So if that's the case, the question then is um what is indeed not just its upper bound but maybe it's lower bound as we'll eventually see. So for selection sort for now, let's stipulate that it's indeed in big O of N squ. And that's actually the worst of the algorithms we've seen. Like that's way slower than linear search because at least linear search was big O of N. Selection sort is N squar which of course is N * N which is and will feel much much slower than that. So what if though we consider the lower bound of selection sort? All right, maybe it's bad in the worst case, but maybe it's really good when the numbers are mostly sorted. Unfortunately, this is the same pseudo code for selection sort. We make no allowance for checking the list to make sure it's already sorted. And in fact, that's kind of a perverse case to consider for any algorithm. What if the problem's already solved? How's your algorithm going to perform? Like if all of my volunteers is they kind of almost did accidentally, they started lining up roughly in order. Suppose they literally had been in order from 0 to 7. Well, my stupid algorithm would still have me walking back and forth, back and forth, back and forth. Why? because the code literally tells me do this this many times and every time I do that find the smallest element. So it's going to be sort of a stupid output because the list is not going to be any changed any any at all changed but my code is not taking into account in any way the original order of the numbers. So no matter what this is to say that if we consider whether the lockers or the humans the omega notation for this algorithm even in the best case where the data is already sorted is crazily also n squared. Now I could certainly change the pseudo code but selection sort as the world knows it is more of a demonstrative algorithm or sort of a quick and dirty one. Its running time is going to be in omega of n squ. And now we can actually deploy our theta notation because the bigo notation is n^ squ and the omega notation is n^ squ and the same. We can also say that selection sort is in theta of n^2 which is not great because that's annoyingly slow. So maybe the solution here is don't do that. Let's use bubble sort instead. The second algorithm where I just compared everyone side by side again and again. Well, here's some pseudo code for bubble sort which you can assume applies to the same kind of array from zero on up to n minus one. Here's one way to write bubble sort. Repeat the following n times. For i from 0 to n minus 2, if the number at location i and the number at location i + 1 are out of order, swap them. And there's kind of an elegance to this algorithm and that like that's it. And you just assume that when you go through the list, this is how from I from 0 to n minus two, this is how I was effectively comparing elements 0 and 1, one and two, two and three, three and four, dot dot dot, uh seven, six and seven. But notice I didn't say eight. There were eight total people. Why do we go from 0 to n minus2 instead of from 0 to n minus one? Uh yeah. Yeah. We already checked the last one. >> Not quite. So it's not that we've already checked the last one. I'm saying with this line of code here, we never even go to N minus one. Technically, >> if we have NUS, it is going to compare against NUS because that's >> exactly because we're doing this simple arithmetic here. We're checking current location I + 1. You can think of these as my left and right hand. Left hand is pointing at zero. Right hand's pointing at one. I don't want to do something stupid and have my left hand point at n minus one because then my right hand arithmetically when you add one is going to point at n which does not exist. That's beyond the boundary of the array because the array goes from zero to n minus one. So just a little bit of a safety check there to make sure we don't walk right off the end of the array. But we do this n times because recall that precious ended up being where uh seven needed to be at the very end of the list. But that didn't mean there weren't seven uh seven more problems still to solve. 0 through six. So I did it again and I did it again and per its name bubble sort the biggest element bubbled up first then the next biggest then the next biggest then the next business biggest biggest that is seven then six then five then four and we got lucky on some of them but eventually we finished with zero. So how do we analyze this thing? Well, we could also technically do this n minus one times as an aside if you're thinking through that I'm wasting some time because we get one for free once we get to uh solving seven problems. You get the eighth one for free because that person is obviously where they need to go. So when we had these numbers initially and we were comparing them with bubble sort again left hand right hand it's like treat this as I this is I plus one and we just kept swapping pair-wise numbers if in fact they were out of order. So all this is saying is what our humans were doing for us organically. So how do we actually analyze the running time of this? Last time I just kind of spitballled that it was n minus one steps plus n minus two steps. Well, you can actually look at pseudo code sometimes and if it's neatly written, you can actually infer from the pseudo code how many steps each line is going to take. For instance, how many steps does this first line take? I mean like literally n minus one. The answer is right there because it's saying to the computer or to me acting it out, repeat the following n minus one times. All right, so that's helpful. How many line how many steps does this inner loop induce? Well, you're going from i to n minus2. So that's actually n minus one total steps not n. And then this question here, if numbers bracket i and numbers i are out of order, it's a single question. It's like our boolean expression. We'll call it one. I mean, maybe you need to do a bit of more work than that, but it's a constant number of steps. Doesn't matter how big the list is. Comparing two numbers is always going to take the same amount of time. And then swapping them, oh, I don't know, it's going to take like one or two or three steps, but constant. Doesn't matter which the numbers are takes the same amount of work. So, let's stipulate, let me rewind, stipulate that the real things that matter are the loops. These constant number of steps, who really cares? But the loops are what are going to add up as n gets large. So this really then is if this is the outer loop and this is the inner loop. Think about our two-dimensional Mario square from week one. We did something on the outside and then something on the inside to get our rows and columns. This is equivalent to n -1 * n minus one. If we do our little foil method, n^2 - n - n + 1 combine like terms, n^2 - 2 n + 1. Who cares? This is ultimately going to be on the order of big O of N squared only because again if you ask yourself when I plug in a really big value for N which of these is really going to contribute most to the answer it's obviously going to be n^ squ again and we can ignore the lower order terms. So this doesn't seem to have made any progress like selection sort was on the order of big O of N was on the order of N squ bubble sort based on this analysis is also on the order of N squed. Maybe we're getting lucky in the lower bound. So on the upper bound for bubble sort, it's indeed n squ as was selection sort. But with this pseudo code for bubble sort, unfortunately we rather unfortunately we were not doing anything clever to catch that perverse case where maybe the list was already sorted. After all, consider if the list was sorted from 0 to 7. I was still asking all the same darn questions. Even if I did no work, I was going to repeat that n minus one times back and forth making no swaps but making all of those comparisons. But here's an enhancement to bubble sort that we can add that selection sort didn't really have room for. I can say after one pass of this inner loop walking from left to right, if I made no swaps, quit. So put another way, if I traverse the list from left to right, I make no swaps, I might as well just terminate the algorithm then because there's no more work clearly to be done. All right. So based on that modification, the lower bound of bubble sorts running time would be said to be an omega then of n because I'm minimally going to need to make one pass through the list. You can't possibly claim that the list is sorted unless you actually check it once. And if there's n elements, you're going to have to look at all n of them to make sure that it's in order. But after that, if you've done no work and made no swaps, no reason to traverse the list again and again and again. So a bubble sort can be said to be an omega of n because indeed we can just terminate after that single pass if we've done no work. We can't say anything about theta because they're not one and the same big O and omega. But that does seem to have given us some savings. Unfortunately, it really only saves us time when the list is already or mostly sorted. But in the average case and in the worst case, odds are they're both going to perform just as bad on the order of n square. In fact, let's take a look at a visualization that'll make this a little clearer than our own humans and voices uh might have explained. Here is a bunch of vertical purple bars uh made by a friend of ours uh in the real world. And this is an animation that has a bunch of buttons that lets us execute certain algorithms. A small bar represents a small number. A big bar represents a big number. And the goal is to get them from small numbers or small bars to big numbers or big bars left to right. So I'm going to go ahead and click on selection sort initially. And what you'll see from left to right is in pink the current smallest element that's been discovered, but also in pink the equivalent of my walking across the stage left to right again and again and again trying to find the next smallest element. And you'll see clearly just like when we put Jaden at the far left, the smallest element ended up over here. But it might take some time for precious for instance or number seven to end up all the way over on the right because with each pass we're really just fixing one problem at a time and there's n problems total which is giving us on the order of those n squared steps and now the list is getting shorter so we're at least doing some work that you don't have to keep touching the elements you already sorted which just like I was. So now selection sort is complete. Let's visualize instead bubble sort. So let me rerandomize the array just so we're starting with a random order. Now let's click on bubble sort. And you'll see the pink bars work a little differently. It conotes which two numbers are being compared at that moment in time. Just my like my left hand and right hand going left to right. And you'll see that even though it's not quite as pretty as selection sort where I was getting at least the smallest elements all the way to the left here, we're just pair fixing pair-wise problems, but the biggest elements like precious's number seven are indeed bubbling their way up to the top one after the other. But as you can see, and this is where n squared is sort of visual visualizable, we're touching these elements or looking at them so many times again and again. We are making so many darn comparisons. This is taking frustratingly long. And this is only what a few dozen bars or numbers. You can imagine how long this might take with hundreds, thousands, or millions of values. I dare say we're going to have to do better than bubble sort and selection sort because we're not done even yet. just trying to give the satisfaction of getting to the end and now we are. But neither of those algorithms seems incredibly performant because it's still taking us quite a bit of time to actually get to that there solution. So how can we actually do better than that? Well, we can try taking a fundamentally different approach. And this is one technique that you might have encountered in math or even in the real world even if you haven't sort of applied this name to it. Recursion is a technique in mathematics and in programming that allows you to take sort of a fundamentally different approach to a problem. And in short, a recursive function is one that's uh defined in terms of itself. So if you had like f ofx equals f of something on the right hand side of a mathematical expression, that would be recursive in that the function is dependent on itself. More practically in the world of programming a recursive function is a function that calls itself. So if you are writing some function in C and in that function you call yourself you actually have a line of code that says call that same function by the same name. That function is recursive. Now this might feel a little weird because if a function is calling itself it feels like this is the easiest way to get into an infinite loop because why would it ever stop if the function is calling itself calling itself calling itself calling itself? We're going to have to actually address that kind of problem. But in the real world, we've actually or rather in this class already, we've actually seen implicitly an example of this including today as well as in week zero. So here is that algorithm for searching the doors of the lockers. And recall that after we did this check at the very top, if there are any doors left, return false. If if uh not, we did these uh conditions. We said if the number is behind the middle door, return true cuz we found it. But things got interesting here where I said if else if the number is less than the middle door then search the left half. Else if the number is greater than the middle door then search the right half. Well at that point in time you should be asking me or yourself well how do I sort search the left half? How do I search the right half? Well here you go. Like on the screen right now is a search algorithm. And even though it says down here search the left half or search the right half which is like well how do I do that? We'll just use the same algorithm again. And this is how in terms of my voice over, you end up searching the left half of the left half or the right half of the left half or any such combination. This line here, search left half. This line here, search right half, is representative of a recursive call. This is an algorithm or a function that calls itself. But why does it not induce an infinite loop? Like why is it important that this line and this line are written exactly as they are so as to avoid this thing just forever searching aimlessly? Yeah, >> there's the condition at which it stops. >> We do have this condition at which it stops. But more importantly, what is happening before I make these recursive calls? >> Exactly. I'm recursing that is calling myself but I'm handing myself a smaller problem. A smaller problem. a smaller problem. It would be bad if I just handed myself the exact same number of doors and just kept saying, "Search these, search these, search these." Because you would never make any progress. But just like our volunteers earlier, so long as we did divide and conquer and we search smaller and smaller numbers of doors, eventually indeed we're going to bottom out and either find the number we're looking for or we're not. So, generally, we're going to call these kinds of conditions that sort of just ask a very obvious question and want an immediate answer base cases. Base cases are generally conditionals that ask a question to which the answer is going to be yes or no right then and there. A recursive case by contrast these two down here is when you actually need to do a bit more work to get to your final answer. You call yourself but with a smaller version of the problem. So we could have in fact in week zero have written this sort of similarly. If you go back to in your mind to week zero we had more of a procedural approach so to speak. When we were searching the phone book, I proposed that this induced what we called loops on line 8 and line 11, which just literally said go back to line three. And that was more of a mechanical way of sort of inducing a loop structure. But if I really wanted to be elegant, I could have said, well, you know what? 7 and 8 together really just mean search the left half. And 10 and 11 together really mean just search the right half. So let's condense these pairs of lines into shorter instructions. Search the left half of the book. Search the right half of the book. I can then delete two blank lines and now I have a recursive algorithm for searching a phone book. It's a little less obvious because you have to ask yourself when you get to line seven or nine, wait a minute, how do I search the left half or the right half? And that's when you need to realize you start the same algorithm again but with a problem that's half as large. In week zero, we do the procedural approach where we literally tell you what line of code to go to, but today we're offering a different formulation, a recursive approach where it's more implicit what you should do. and we'll see now a couple of examples from the real world, so to speak. So, here's a screenshot from Super Mario Brothers 1 on the original Nintendo uh entertainment system. Let me go ahead and get rid of some of the distraction like the the um ground and the mountains there. And here we have a sort of half pyramid, not unlike that you implemented in problem set one. But this is an interesting realworld physical structure in that you can define it recursively. Like what is a pyramid of height for if you will? Well, just to be a little uh a little difficult, a pyramid of height four is really just a pyramid of height three plus one more row. Okay. Well, what is a pyramid of height three? Well, a pyramid of height three is really just a pyramid of height two plus one more row. Well, what's a pyramid of height two? Well, a pyramid of height two is really just a pyramid of height one plus one more row. Well, what's a pyramid of height one? A single brick on the screen. And I sort of changed my tone with that last remark to convey that this could then be our base case whereby I just tell you what the thing is without sort of kicking the can and inviting you to think through what a smaller structure is plus one more row. Whereas every other definition I gave you then of a pyramid of some height was defined in terms of that same structure albeit a smaller version thereof. So we can actually um see this in the real world. Let me go ahead and pull up one thing here. I'm going to go to uh give me one sec before I flip over. Here I am on google.com. If you'd like a little computer science humor here, uh if you ever Google search for recursion and hit enter, you'll see uh a joke that computer scientists at Google find funny. Haha. One, two laughs. Does anyone see the joke? I did not make a typo, but Google's asking me, did I mean recursion? And if I click on that, I just get the same haha page. Okay. All right. That didn't go over well. Anyhow, so there are these Easter eggs in the wild everywhere because computer scientists are the ones that implement these things. But let's go ahead and actually um implement, for instance, a version of this in code. Let me go back over here in a moment to VS Code. And in VS Code, let me propose that in my terminal window, let me create one of two final programs. This one's going to be called iteration C. Just to make clear that this is the iterative that is loop-based version of a program whose purpose in life is to print out a simple Mario pyramid. I'm going to go ahead and include cs50.h at the top as well as standard io.h. I'm not going to need string.h. I don't need any command line arguments today. So this is going to start off with inmain void. And now I'm going to go ahead and ask a question like uh give me a variable called height of type integer and ask the human for the height of this Mario like pyramid. And then let's assume for the moment that I've already implemented a function called draw whose purpose in life is to draw a pyramid of that height semicolon. So I've abstracted away for the moment the notion of drawing that pyramid. Now let's actually implement draw whose purpose in life again is to print out a pyramid akin to the one we saw a moment ago like this here on the screen. Well, in order to print out a pyramid of a given height, I think I need to say uh void uh draw int n for instance because I'm not going to bother returning a value. I just want this thing to print something on the screen. So void is the return type. But I do want to take as input an integer like the height of the thing I want to print. I can call this argument or parameter anything I want. I'll call it n for number. So how can I print out a pyramid that again looks like this? Well, I'll do this quicker than you might have in problem set one. But seems obvious that like on the first row I want one brick. On the second row I want two. On the third I want three. On the fourth I want four. So it's actually a little easier than problem set one in that it's sloped in a different direction. So let me go ahead and do exactly this in code. Let me say for int i= 0 i less than n the height i ++. So this is going to be really for each row of the pyramid pyramid. Let me go ahead now and in an inner loop for int j equals z, let's do j less than i + 1 for reasons we'll see in a moment and then j++ and then inside of this loop let's just print out a single hash no new line but at the end of the row let's print out a single new line to move the cursor to the next line. Now why am I doing this? Well, this represents for each column of pyramid. And if you think about it, on the first row, which is row zero, I actually want to print not zero bricks, but one brick. So that's why I want to go ahead here and go from zero to i + 1 because if i is zero, i + 1 is 1. So my inner loop is going to go from 0 to 1, which is going to give me one brick. It's a little annoying to think about the math, but this just makes sure that I'm actually getting bricks in the order I want them. And then it's going to give me two bricks and then three and then four. And between each of those rows, it's going to print a new line. So let's go ahead and do make iteration to compile this code. Ah, I messed up. Why do I have a mistake on line eight of this code? Let me hide my terminal and scroll back up. It seems clang. My compiler does not like my draw function. Yeah. Yeah, I forgot the prototype. So this is the one and only time where it seems reasonable to copy paste. Let's grab the prototype of that function up here and go ahead and teach the compiler from the get-go what this function is going to look like even though I'm not defining it now until line 13 onward. All right, let's go ahead and make iteration again. Ah, dot /iteration. Enter. Let's do a height of say four. And voila, now I've got that there pyramid. So, I did it a little quickly and it's certainly to be expected if it took you hours on problem set one to get the other type of pyramid printed. But the point for today is really to demonstrate how we can print a pyramid like this using indeed what I'd call iteration. Iteration just means using loops to solve some problem. But we can alternatively use recursion by reimplementing our draw function in a way that's defined in terms of itself. So let me go into my code here and I'm actually going to leave the prototype the same. I'm going to leave main the same. But what I'm going to go ahead and do is delete all of this iterative code that's doing things very procedurally step by step by step with loops. And I'm instead going to do something like this. Well, if I want to print a pyramid of height n, what did I say earlier? Well, a pyramid of height n is really just a pyramid of height n minus one plus one more row. So, how do I implement encode that idea? Well, let me go back in code here and say, well, if a pyramid of height n first requires drawing a pyramid of height n minus one, I think I can just write this, which is kind of crazy to look at, but cuz you're calling yourself in yourself, but let's see where this takes us. Once I have drawn a pyramid of height n minus one, that is a height three for instance, what remains for me to do is to myself print one more row. And so to print one more row, I think I can do that really easily with fewer loops. I can do four int i= 0 i less than n i ++ and then very simply in this loop I can print out a single hash one at a time at the end of this loop I can print out a new line but no more nesting of loops what I've done is print one more row and here I've done print a pyramid of height n minus one I'm not quite done yet but I think this is consistent with my verbal definition that a pyramid of height three is a pyramid of height sorry a pyramid of height four is a pyramid of height three which I can implement per line 16 just draw me a pyramid of height n minus one and then I myself will take the trouble to print the fourth and final row but something's missing in this code let me go ahead and try running it let's see what happens make oh oh darn it I meant to call this something else so I'm going to do this I'm going to close this version here I'm going going to rename iteration C to recursion C to make clear that this version is completely different. Let me now go ahead and make the recursion version. And huh, Clang is noticing that I have screwed up. On line 14, it says error. All paths through this function will call itself. And Clang doesn't even want to let me compile this code because that would mean literally just forever loop effectively by calling yourself. So what am I missing in my code here? If I open up what we're now calling recursion.c in my editor, what's missing here over here? Yeah, I'm missing a base case. And I can express this in a few different ways, but I would propose that before I do any drawing of anything at all, let's just ask ourselves if there is anything to draw. So, how about if n equals zero, well then don't do anything, just return. You don't return a value. When your return value is void, it means you don't return anything. So you just return period or return semicolon. Or just to be super safe, I could actually do something like this, which is arguably better practice just in case I get into this perverse scenario where someone hands me a negative number. I want to be able to handle that and not print anything either. So just to be safe, I might say less than or equal to zero. I'm not doing one because if I did do one, then I would want to at least myself print out one brick, which is fine, but I'd have to like rech change all of my code a little bit. So I think it's safer if my base case is just if n is less than or equal to zero, you're done. Don't do anything. And this then ensures that even though thereafter I keep calling draw again and again and again and the problems getting smaller and smaller from four to three to two to one, as soon as I hit zero, the function will finally return. So let's go ahead and open up my terminal. Rerun make recursion to make this version did compile this time. dot /recursion enter let's type in four cross my fingers and this too prints the exact same thing and even though it doesn't look like fewer lines of code I would offer that there's an elegance to what I've just done whereas with the iterative version with all the loops it was very clunky like step by step just print this and print that and have a nested loop inside of another but with this especially if we distill it into its essence by getting rid of my comments like this and frankly I can get rid of the unnecessary curly braces only because for single lines in conditionals. You don't need them. Like this is arguably like a very beautiful implementation of drawing Mario's pyramid even though it's calling itself and arguably because it is calling itself. Questions then on this idea of recursion or this implementation of Mario? Yeah. >> Are there no scope issues involved if you like? >> Good question. Are there any scope issues involved? Short answer, no. However, the current value of I, for instance, will not be visible to the next time the function is called. It will have its own copy of I, if that's what you mean. And we'll next week talk in more detail about what's going on here. And in fact, I probably can't break this in class very easily. But it turns out if I use a very large version for heights, let's just hit a lot of zeros and see what happens. That was too many. Let's see what happens. That's also too many. Let's see what happens there. That's the first time at least I in class have encountered this error. You might have encountered this weird bug in office hours or in your problem set and that's fine if you did. We'll talk about what this means next week too. But this is bad. Like this clearly hints at a problem in my code. However, the iterative version of this program would not have that same error. So this relates to something involving memory because it turns out as a little teaser for next week, each time I call draw, I'm using a little more memory, a little more memory, a little more memory, a little more memory, and my computer only has so much memory. this program in its current form is using too much memory. There are workarounds to this, but that is a trade-off to the elegance we're gaining in this solution. So, what's the point of all this? And how do we get sidetracked by Mario? There's another sorting algorithm. The third and final one that we'll consider today that actually uses recursion to solve the problem not only elegantly arguably, but also way faster somehow than bubble sort and selection sort. And in essence, it does so by making far fewer comparisons and wasting a lot less work. It doesn't keep comparing the same numbers again and again. Here in its essence is the pseudo code for merge sort. Sort the left half of the numbers, sort the right half of the numbers, then merge the sorted halves. And this is kind of a weird implementation of an algorithm because I'm not really telling you anything. It seems like you're asking me how do I sort numbers and I say, well, sort the left half, sort the right half. It's like someone being difficult. And yet implicit in this third line is apparently some magic. This notion of merging halves that are somehow already sorted is actually going to yield a successful result. As an aside, we're actually going to need one base case here, too. So, if you're only given one number, you might as well quit right away because there's nothing to do. So, we'll toss that in there as well. And base cases are often for zero or one or some smallum sized problem. In this case, it's a little easier to express it as one because if you have one element, it's indeed already sorted. So, what does it mean to merge two sorted halves? Well, let's actually consider this. I'm going to reuse some of these same numbers here. I'm going to put my one, my three, my four, and my six on the left. And these together represent a list that is indeed sorted of size four. And then I'm going to put four other numbers on the right there that are similarly sorted as well. And by merging these two lists, I mean start at the left end of this list, start at the left end of this list, and just decide one step at a time which number is the next smallest. And then I'm going to put it on the top shelf to make clear what is sorted. So if my left hand's pointing at this list, my right hand's pointing at there, which hand is obviously pointing to the smaller element, left or right? Like the right. So I'm going to grab this and I'm going to use a little more space up top here and put the zero in place. And then I'm going to point to the next element there. So my left hand has not moved yet. It's still pointing at the one. My right hand is pointing at the two. Which number comes next? Clearly left. So, I'm going to grab the one and put it up there and update where my left hand is pointing. So, now I'm pointing at the three here and the two there. What comes next? Obviously the two. What comes next? Obviously the three. What comes next? Obviously the four. What comes next? Obviously the five. But notice my hands are not going back and forth, back and forth, back and forth like any of the algorithms thus far. I'm just taking baby steps, moving them only to the right, effectively pointing at for a final time each number once and only once. What comes next? Six. And now my left hand is done. What comes last? The number seven. So what I just did is what I mean by merge the sorted halves. If you can somehow get into a scenario where you've got a small list sorted and another small list sorted, it's super easy now to merge them together using that left right approach, which I'll claim only takes n steps. Why? Because every time I asked you a question, I was taking one bite out of the problem. There's eight bytes total. I asked you eight questions or I would have if I verbalized them all. So, it's n steps total to merge lists of that size. So, what then is merge sort? Merge sort is really all three of these steps together only one of which we've acted out. Two of which are sort of cyclical in nature. They're recursive by design. So what does this mean? Well, let's start with this list of eight numbers which is clearly out of order. 6 3 4 1 5270. And let's apply merge sort to this set of numbers. And I'll do it digitally here because it'll take forever to keep moving the numbers up and down physically. So let's move it to the top just to give ourselves a little bit more room. And let me propose that we apply merge sort. What was the very first step in merge sort? At least that we highlighted the juicy steps. What's the first step in merge sort? Sort the left half. Yeah. And then the second step was going to be sort the right half. And then the third step was going to be merge the sorted halves. So let's see what this means by actually acting it out on these numbers. So here's my eight numbers. Let's go ahead and sort the left half. Well, the left half is obviously going to be the four numbers on the left. And I'm just going to pull them out just to draw our attention to them over here. Now I have a list of size four and the goal is to sort the left half. How do I sort a list of size four? >> Uh be well yes but just be more pedantic like how do I sort any list using merge sort >> sort the left half. So let's do just that. So of a list of size four how do I sort this? Well I'm going to sort the left half. How do I sort a list of size two? >> Sort the left half. All right. Well I'm just going to write the six here. How do I sort a list of size one? I just don't. I'm done. That was the so-called base case where I just said return. Like I'm done sorting the list. Okay, so here I here's the story recap. Sort the left half. Sort the left half. Sort the left half. And I just finished sorting this. So what comes next? Sort the right half, which is this. And now I've sorted the left half of the left half of the left half, which is a big mouthful. But what do I do as a third and final step when sorting this list of size two? Merge them. This part we know how to do. I point left and right. And I now take the smallest element first, which is the three. Then I take the six. And now this list of size two is sorted. So if you remind in your mind's eye, what step are we on? Well, we have now sorted the left half of the left half. So what comes after the left half is sorted? We sort the right half. So we're sort of rewinding in time, but that's okay. I'm keeping track of the steps in my mind. I want to now sort this list of size two. How do you sort a list of size two? Well, you divide it into a list of size one. How do you sort this? You're done. You then take the other right half and you sort it. Done. Now you merge the two sorted halves. So I point at the four and the one. Obviously the one comes first, then the four. Now I have sorted the right half of the uh the right half of the left half of the original numbers. What's the next step? Now that I have the left and right halves of this list of s four sorted merge those. So same idea but with fewer elements. I'm pointing at the three and the one. Obviously the one comes. Now I'm pointing at the three and the four. Obviously the three comes next. Pointing at the six and the four. The four comes next. And now the six comes last. Now I have sorted the left half. And it's intentional that 1 3 4 6 is the original arrangement of the lighted numbers I had on the shelves a moment ago. All right, it's a long story it seems. But what comes after you sorting the left half of the original list? You sort the right half. So let's put some uh put those numbers over here. How do I sort a list of size four? Well, you sort the left half. How do you sort this thing of size two? You sort the left half. You sort the right half. And now you merge those together. How do I now sort the right half of the right half? Well, I sort the left half. I sort the right half. And then I merge those together. Now I have sorted the left half and the right half of the right half of the original elements. What's next? The merging 0 2 5 and 7. Now we're exactly where we were originally with the lighted numbers. I've got 1 3 4 6. The left half sorted 0257. The right half sorted. What's the third and final step? Merge those two halves. of course 0 1 2 3 4 5 6 and 7 and hopefully even though there's a lot of words that come out of my mouth I was acting this out there wasn't a lot of back and forth like I definitely wasn't like walking back and forth physically and I also wasn't comparing the same numbers again and again I was doing sort of different work at different conceptual levels but that was like only what like three levels total it wasn't n levels on the board visually so where does this get us with merge sort s. Well, with merge sort, it would seem that we have an algorithm that I claim is doing a lot less work. The catch, though, is that merge sort requires twice as much space, just as we saw when I needed two shelves in order to merge those two lists. So, how much less work is actually going to be possible? Well, let's consider sort of the analysis of the original list and how we might describe its its running time in terms of this big O notation. Hopefully, it's not going to be as bad as n^ squ ultimately. So, here are some like breadcrumbs that if I hadn't kept updating the screen and deleting numbers once we moved them around, here are sort of like traces of every bit of work that we did. We started up here. We did the left half, the left half of the left half, the right half of the right half, and then everything else in between. And you'll see that essentially I took a list of size eight and I did three different passes through it. At this conceptual level, at this conceptual level, and at this one. And each time I did that, I had to merge elements together. And if you kind of think about it here, I pointed at four elements here and four elements here. And in total, I pointed at eight elements. So there was n steps here for merging. And if you trust me, I'll claim that on this level conceptually, there were also eight steps. I wasn't merging lists of size four, but I was merging two lists of size two over here and two more lists of size two over there. So if you add those up, those are n total steps or or merges, if you will. And then down here, this was sort of kind of silly. I was but I was merging ultimately eight single lists alto together into the higher level of con uh of conceptually. So from a list of size eight we sort of had three levels of work and on each level we did n steps the merging. So where is three? Well it turns out if you have eight elements up here the relationship between 8 and three is actually something formulaic and we can describe it as log base 2 of n. Why? Because if n is eight, if you don't mind doing some logarithms here, log base 2 of 8 is the same thing as log base 2 of 2 to the 3 power. The log 2 and the two cancel itself out, which gives you exactly the number three that I sort of visualized with those traces on the screen. Which is to say irrespective of the specific value of n the big O running time of merge sort is apparently not n^ squ but it's log n time n or more conventionally n * log n because you're doing n things log n times technically base 2 but we don't care about that generally for big O notation and indeed in big O notation we would say that merge sort is on the order of N log N that's its big O running time sort of at the upper bound. What about the lower order bound? Well, there's no clever optimization in our current implementation as there was for bubble sort. And so it turns out the lower bound would be an omega of n login and in theta therefore of n login as well because big o and omega are in fact in this case one and the same. And if we actually go back to our visualization from earlier, give me just a moment to pull that up here. In our earlier implementation or an earlier demonstration of these algorithms, we had a side-by-side comparison of all the comparisons. But here, if I go ahead and randomize it and click merge sort, you'll see a very different and clearly faster algorithm. Even though the computer speed has not changed, but it's touching these elements so many fewer times, it's wasting a lot less time because of this cleverness where it's instead dividing and conquering the problem into smaller and smaller and smaller pieces. And to give this a final flourish since that was yes faster but not necessarily obviously faster than other things that we've done. How might we actually compare these things side by side by side? Well, in our final moments together, let's go ahead and dramatically and for no real reason just dim the lights so that I'll hit play on a visualization that at the top is going to show you selection sort with a bunch of random data. On the bottom is going to show you show you bubble sort with a bunch of random data. And in the middle is going to show you merge sort. And the takeaway ultimately for today is the appreciable feel of difference between big O of N^2 and now big O of N log N. Heat. Heat. [music] [music] >> [music] >> All right. The music just makes sorting more fun. But that's it for today. We will see you next time. [applause] [music] >> [music] [music] [music] >> All right. This is CS50 and this is week four, the week in which we take off the proverbial training wheels that have been the CS50 library and reveal to you all the more what's going on underneath the hood of a computer in terms of its memory. We'll also talk about files and how you can actually persist information for a long time, whether it's a file you've downloaded or today that you've created yourself. But first, I just wanted to share some artwork that two of your classmates, Avery and Marie, kindly made before class, which is a picture made out of Post-it notes. uh some green, some purple, which collectively from where you are looks like what? >> Yeah. So indeed it's a cat that they made using only zeros and ones or green and purple pieces. And in fact, even though this is fairly low resolution in that it only has a few pixels this way and a few pixels this way, it's actually representative of how computers do actually store images underneath the hood. So let's actually start there. In fact, we've had this bowl of stress balls for some time here on the lect turn. And if we take a beautiful photo of it, they look a little something like this. Of course, this too is a finite resolution. And by resolution, I just mean how many dots go horizontally and how many dots go vertically. Multiply those two together and you get some number of bytes, maybe in kilobytes, megabytes, or heck, if it's a massive image, it could be even bigger than that. But it is in fact finite. And if we zoom in on this image, you start to see a little more detail. But at the same time, if you keep zooming in, you start to see indeed that there's only finite detail. And when we go really uh zoomed in, you start to see actual dots or pixels as they're called. In fact, on most any screen, any image you look at, if you look close enough by pulling your phone up to your eyes or walking really close to a TV, you may very well see the same thing because any image on a screen like this is represented by hundreds, thousands, millions of tiny little dots called pixels. And each of those pixels has a color that gives it collectively the appearance of stress balls in this case or cats in this case. So in fact among the things we're going to do this week in the problem set is actually have you write code via which you can manipulate your own images um not only to understand what's going on underneath the hood but to apply some of today's most familiar filters so to speak. In fact if we go all the way down here you'll see that this image of course is multiple colors. We've got some white and some red and shades in between. But let's keep things simple for a moment and propose that instead of looking at these dots, we look at these zeros and ones. And let me propose that in a picture like this, any zero will be interpreted as black. Any one will be interpreted as white accordingly. If you can see it, what is this a picture of? >> Oh, smiley face is in fact right. Because if you kind of focus only on the zeros and try to ignore those ones, as I can do here for you, you'll see that embedded in that image was in fact this smiley face. Now, this would be a sort of one bit image. You either have a zero or one representing each of the colors. In modern times, we would actually use 16 bits per color, 24 bits for color, maybe even more. And that's how we can get every color of the rainbow instead of just something black and white. But in effect, what's happening here is that if you did have a file on your Mac or PC or phone storing this pattern of zeros and ones and you opened it up in some kind of image program or like the photos app, it would be depicted to you visually as this simply a grid X and Y where some of the dots are white, some of the dots dots are black. All right, so with that said, how what kinds of um representations might be involved here? Well, we can actually rewind to week zero. Recall that we talked briefly about RGB, which just means red, green, and blue, which is one of the most common ways to represent colors inside of a computer. And if any of you have ever dabbled with Photoshop or similar editing programs, or if maybe in high school or earlier you made your own web pages, odds are you're actually familiar with a syntax we're going to see a lot of today. This doesn't add anything intellectually new. It's just an introduction to a common convention for how else we can represent numbers. So, this is a screenshot of Photoshop's color picker. Photoshop being a popular program for editing photos and files. And you'll see here that my selected color looks to the human eye as black. And I've highlighted here how I got that. I chose black by typing in 0 0 0. Which also, if you look up here, means that I want zero red, zero green, and zero blue. And yet, we somehow translated it to six zeros instead of just three. Well, if we take a look at another color like white instead, I claim that you can represent white in Photoshop and today in code with FF FFF or equivalently 255 red, 255 green, 255 blue. And here, if you think back to week zero is maybe a hint at where we're going with this. If you're using an 8bit number, which means then you can count from zero on up to 255. So recall that 255 is like the biggest number you can represent with just eight bits. And yet somehow there's going to be a relationship between the 255s and these Fs that we see down here. Let's just run through a few more. If we wanted to represent something like red, we're going to use FF 000000. If we want to represent green, we're going to use 00 FF 0. And lastly, to represent blue, we're going to use 0000 FF. So what's going on here? And why do we have just this different convention? Well, turns out in the context of images and also memory in general, it's just human convention or programmer convention to use this alternate representation of numbers. Not the so-called decimal system, but another one that's not all that far off from what we've been doing over the past few weeks. So, here again was the binary system. You've got just two digits in your vocabulary, 0 and one. Here is the familiar decimal system where you've got 10 instead, 0 through 9. Suppose we wanted a few more digits. Well, we're sort of out of Arabic numerals here, but I could toss into the mix like A, B, C, D, E, and F, either in lowercase or uppercase. And in fact, that's what computer scientists do when they want to have more than just 10 digits available to them, but as many as 16 digits available. And in fact, when you want to use this many digits, you call it hexa decimal, implying that you've got 16 digits, aka base 16. Now, this there's an infinite number of base systems. We could do base 3, base 4, base 15, base 17 on up. But this is just one of the relatively few conventions that are popular in computing. And let's just tease it apart because we're going to see these kinds of numbers a lot. Well, thankfully, like in week zero, like it's the same old number system with which you're familiar with the columns and the placeholders. It's just the bases in those columns mean a little something different. So instead of using powers of two or powers of 10, we're going to today use powers of 16. So 16 to the 0 of course is 1. 16 to the first power is uh 16. So we have the ones column, the 16's column and so forth. Meanwhile, if we wanted to therefore start counting in hexadimal, this twodigit number in hexadimal is of course the number you and I know in decimal as 0 because it's still just 16 * 0 + 1 * 0. This in hexadeimal is how you would represent one, but you would say 01 or 01 instead of just one to make clear there's two digits. This would be 02 03 04 05 6 7 8 9. Now things get a little interesting. In the decimal world, we're about to carry the one and give ourselves two digits 1 and zero. But in hexodimal, you can keep going. So the next number in hexodimal is going to be 0 A 0 B 0 C 0 D 0 E 0 F. And now things get interesting again. What probably comes after zero F? Even if you've never seen hex before >> so one zero. You still still carry the one as before. This goes back to zero. And why is this now appropriate? Well, how many digits did we just how many numbers did we just count through? Well, we started at 0 0. We went up through 0 F. And that's a total of 16 combinations. So, the highest we counted, let me rewind. This number here, of course, is going to be 1* F. But what is F? Well, let's rewind further. In fact, let's have our little cheat sheet here. If we want to have these digits at our disposal, I dare say that 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. So fif f is just going to represent the number 15. So if we now fast forward back to where we were just counting from zero on up through 0 a through 0 f, we land here. This of course is 16 * 0 1 * f which is 1 * 15. So this is how in hexodimal you would represent the number 15. This in hexodimal is how you would represent the number 16 instead. 15 to 16. This is not 10. That's how you would pronounce it in decimal. This is 1 0 in hexodimal because 16 * 1 + 1 * 0 gives us of course 16. Now we could do this toward infinity but we won't. 1 2 1 3 dot dot dot all the way up to ff. So quick mental math. 16 * f. That is to say 16 * 15 + 1 * 15 is any guesses? >> It is in fact 255. You don't even have to do the math because if you just think about where we were going with this, indeed we saw pairs of fs in the Photoshop screenshots because this is how a computer would represent the number you and I know in decimal is 255 by just using two fs. So why do we care about hexadimal? Well, it turns out that it's just convenient to use two hexadesimal digits to represent numbers because a single hexodimal digit can be used to represent four bits at once. For instance, let me go ahead and explode this by putting a little bit of space between the two digits here. And let's consider how you would represent f. Well, if f is 15 and you want to represent 15 in binary, I think that's just going to be 1 one one. Now, why is that? Well, one in the eighth's place plus one in the four's place uh plus uh one in the two's place plus one in the onees place indeed gives me 15. So using a single f I can count up as we've seen already as high as 15. But of course I've claimed in the past that it's super common to use eight bits at a time or one bite to represent any value because that's just a very useful common unit of measure. And so in hexadimal if you wanted to represent four ones you can say f. If you want to represent another four ones, you can just say f, which is to say that f and f together is just like the same as eight ones together, which is how we finally get to the total number of 255 because this is the ones place, the two's place, the four's place, the 8s, 16, 32, 64, 128. But if you group these into clusters of four bits alone, you can represent all of the possibilities from 0 through 15 just using 0 through f. So with one hex digit you can represent four bits which is a long way of saying is it's just convenient for that reason which is why the world tends to use hex when talking about colors and as we'll see memory as well. So in fact let's consider what is meant by memory and what's going on inside of the computer when we've been storing values thus far. Well here's that canvas of memory. I proposed last time uh in uh I proposed last time and before that we can sort of number these bytes arbitrarily but reasonably. This is bite 0 1 2 3 4 5 6 7 dot dot dot and maybe this is bite 15. That's fine. Nothing wrong with that. But in the real world, any programmer would actually think of these locations instead not in decimal notation but in hexadimal notation just because because it's convenience for the reasons discussed. So we would actually number these from zero on up through 9 and then keep going with a b c d e f and so forth. So what does that mean for the other digits? Well, this would be 1 0. This would be 1 1. This would be 1 2 dot dot dot. Here now is 1 9. But here's 1 A, 1 B, 1 C, 1 D, 1 E, 1 F, and so forth, just using hexodimal notation. But there's arguably some ambiguity here. For instance, if you just at a glance were to look at this board and see this address 1 0, is that by 10 or is that byte 16? It's just non-obvious because if you don't know what base system you're working in, which you could infer by looking at the rest of it, it could potentially be ambiguous. So in the world of hexodimal, super common to literally prefix any number you ever write in hexodimal notation using 0x. The zero doesn't mean anything per se or the x. It just means what follows the 0x is a number in hexodimal notation which makes unambiguous the fact that this is o x10 which if you do the math in decimal again ends up being 16 not of course the number 10. In short today you're about to see a lot of zero x's and a lot of twodigit or fourdigit or 8digit numbers in hexodimal notation. Generally we don't care what the numbers translate to. You don't need to do a lot of math but it's going to be common place to see syntax like this. All right, back to sort of normal time. So, here is a line of code int n equals 50 wherein we might want to declare a variable called n and store a number like 50 in it. Let's actually go ahead and do this simple now as it probably is in a file called how about addresses C. We're going to play around with computer addresses. And in addresses C, I'm going to do something super simple at first whereby I'm going to include standard io.h. Then I'm going to go ahead and in uh write int main void. No command line arguments here. And then I'm going to declare this variable n, set it equal to the arbitrary but familiar value of 50. And then just so that this program does something mildly useful, let's go ahead and print out with percent i and a back slashn that value of n. So nothing new here. I'm just literally going through the motions of declaring a variable and printing its value. So let's do that. Make addresses enter dot slash addresses. And hopefully I'll indeed see the number 50. So, not all that much going on in the code, but let's consider what's going on in the computer's memory. This line of code and the one after it is giving the results of that program, but where is that n ending up? Well, here's my grid of memory. And let's just suppose for the sake of discussion that the 50 ends up down here. Maybe there's other things going on in my program. So, this part of my computer's memory is already in use. So, it's reasonable that it could end up in this location here. But what is important is that how many bytes am I using for n? Apparently, >> four. And that's because we've said integers tend to be four bytes aka 32 bits. So this is at least to scale even though I'm just imagining where it ends up in memory. So that's where the 50 actually ends up. So when I actually call print f and pass in n, clearly the computer is going to that location in memory and actually printing out that value. But that value is indeed at a specific memory address. It's not going to be quite as simple as ox0 or o x1 or a small number typically. It maybe is going to be something arbitrary like ox123 where I'm just making this up. It's an easily pronouncable number in hexadimal notation. All right. So what can I use that information for? Well, thus far this hasn't been useful to us, but certainly programs we've been writing have actually been making use of this. But with a bit more syntax, I can actually start to see things like this, not just on the screen, but in code. In fact, let me propose that we introduce two new operators in C. So, two new pieces of syntax. One is a single amperand and one is a single asterisk. And we'll see that uh the asterisk has a few different uses, but the amperand has a very simple straightforward one, which is to just get the address of a variable in memory. So if you've got a variable like n, if you prefix it with amperand n, you can actually ask the computer at what address is this variable stored. You can find out if it's indeed ox123 or something else altogether. So in fact, let me go ahead and do this by going back to my addresses.c program and let's see if we can print out not the value, which is obviously going to be 50, but let's actually print out the address thereof. So up here in my code, I'm going to change the N on line six to be amperand N instead. And I'm going to go ahead and make one other change because yes, N lives at an address. And yes, that address is technically a number, but it's conventional not to use percent I to display that number, but rather another piece of syntax, which is just a new format code, which you don't often need. This is more demonstrative than useful, I would say. But percent p is going to be what we use when we want to print out an address of something in the computer's memory. So, back to the VS Code. One more change. I'm going to change my percent i to percent p instead. So, at this moment, we should see a version of the program that's not going to display 50 anymore, but something like ox123, but probably a bigger number than that cuz my computer has way more memory than that address suggests. So, let's again make addresses. Let's run dot / addresses. And indeed, this variable at that moment in time apparently lives somewhere in the computer's memory at address ox7 FFD3 C34 EC C. All of those are hexodimal digits. It would be painful to do the mental math to figure out what the numeric address is. But we're seeing it indeed in this common hexodimal notation which is not going to be often useful for us as humans. But the computer is and has been using this information for some time. So in fact what we're about to introduce is admittedly one of the more complicated concepts in computing and in C in particular namely a topic called pointers. And I will say today more so than ever might feel like a bit of a fire hose. In fact, all these years later, I still remember the day in which I finally understood this topic, which was not the day of the lecture in which it was introduced, but it was in like the back right corner of the Elliot House dining hall. I was sitting down during office hours with my teaching fellow and he finally helped that light bulb go off over my head. So, if some of this feels a little arcane today, it just comes with time and with practice like everything else. So, what is a pointer? A pointer is going to be a variable that can store an address. Now, yes, that address is technically just a number, like an integer, but we distinguish between integers that we care about like 50 and things we might do math on, and a pointer, which in this case is just going to be the address of a variable uh the address of a value in memory. So, what does this mean? Well, we can start to do things like this. I can declare my variable n as before and set it equal to the value 50. But I can actually get the address of n and put that address in another variable. And that variable we now call a pointer. So P is going to be the name of this variable. It's going to store the address of N which we can get using the amperand. But there's one more piece of syntax which I promised before. This asterisk here. And the asterisk here means that this variable P stores the address of an integer, not an actual integer per se. It's weird looking syntax. It kind of looks like multiplication, but it isn't. It's just the developers of C decades ago decided to use an asterisk, even though it's admittedly nonobvious what it's doing. But in this context, when you see an asterisk right after a data type like int, it just means that the variable in question is not going to be an int per se, but an address of an integer. Okay, so let's put this to the test using a line of this code in my own file here. Let me propose that we do this. Let me go back to VS Code here. Let me introduce this additional variable int star p as it's typically pronounced. Set that equal to amperand n and then do the exact same thing as before. Let's not print out amperand n but let's actually print out the value of p itself because p is now equivalent to amperand n. So let me go back to VS Code. Let me do make addresses again. And huh, I did something wrong and stupid here. This was not meant to be the moral of the story. What did I do wrong? Yeah. >> Yeah, I just missed the semicolon. So, still making those mistakes here. All right. And let me clear my screen again and do make addresses. Entertresses. And now I should indeed see the address of N, I just so happen to temporarily store it this time inside of a variable called P. Now, just so you've seen it, it turns out that when using this syntax of using a star to declare a so-called pointer and amperand over here to get the address of something, you might see in online references and such different formattings of this. This is the canonical way to declare a pointer. Int space, then the star, then without a space, the name of the variable. However, it will work and you will sometimes see that the star is over here or the star is in the middle. But again, we would recommend stylistically that it just go here. Admittedly, I think it would have been clean clearer if the star were over here, making clear that it's related more to the int than it is to the variable name. But this is simply the convention. So this means, hey computer, give me a variable called p that's going to store the address of an integer. And the amperand is just saying, hey computer, tell me the address of n. And it's the compiler and computer itself that decided where to put that variable in memory. Questions. >> Would you get an error if you didn't put the asterisk? You would. And let's take a look. So, let me go ahead and clear my terminal. Let me go ahead and delete the star before the variable p. Now, let me go ahead and do make addresses again. And indeed, I'm getting an error. Incompatible pointer to integer conversion initializing int dot dot dot. And even though that's a lot of big words, it kind of says what it means. You're trying to go from a pointer on the right to an integer on the left, which is just not appropriate here. Yes, at the end of the day, they're all numbers, but it's more properly a pointer or an address on the right, but a little old int now incorrectly on the left. So, the fix there is just to indeed put it back. Other questions on this new syntax? Yeah. you do like >> indeed. To recap the question, can you use the address of operator to find the address of other data types like strings? Absolutely. And we'll do that with a couple of examples today as well. We're just using ins to keep it super simple initially. Other questions on these addresses and pointers. >> So we still use variables even if they're not integers. Is that right? >> Correct. Correct. Even if it's not an int question, we'll come back to other data types in a little bit. You're still going to use the star. That is the same syntax for everything. And yes, >> can you tell the computer I want to store these variables in this address? >> Oh yes. Can you tell the computer you want to store a variable in this address? That's where we're going in just a bit. Indeed. Now that we have the ability to find out the address of something in memory, stands to reason that we can go to that address ourselves and maybe poke around and actually put values there. And in fact, that's that's among our goals for today. So let's consider how we might get there. So here now is my canvas of memory and let me propose that the number 50 happened to get stored in the variable n down there at bottom right just because and that's probably ox123 or in reality a much larger address but it's easier and quicker for us to just pretend it's at 0x123. What is actually happening in code when I declare P and put a value there? Well, recall a moment ago I declared P to be a pointer to an integer. that is the address of an integer. So what's happening in memory is this. If n is down here and happens to be at address ox123 when I actually assign p to amperand n that just literally takes that address of n and puts it inside of p. Now p as an aside happens to be pretty big. It turns out by convention on most systems a pointer that is a variable that stores an address is actually going to be eight bytes large. It's going to be 64 bits. Why is that? Our computers have so much darn memory nowadays in the gigabytes that you need to be able to count higher than 4 billion. As an aside, if you only used 32 bits for your pointers, you could only count recall as high as 4 billion. 4 billion uh is 4 gigabytes equivalently. That would mean your computers could not have 8 gigabytes of memory, 16 gigabytes of memory. Your servers couldn't have tens of gigabytes of memories. We use 64 bits or eight bytes nowadays for pointers because our computers have that much more memory. All right. So what is Ptor Storing? Literally just an address like this. So when we wrote this code just a moment ago, what the computer did and has been doing for the past several weeks is literally just finding the location of N in memory and plopping that value inside of P which itself is taking up a bit of memory but or uh by convention more memory 8 bytes in this case. The thing is who really cares about this level of detail? Typically, as programmers, it's useful to understand what's going on, but rarely are we going to care precisely about where things are in memory. Today is really about just kind of looking at what's going on underneath the hood. So, in fact, we can abstract away most of my computer's memory, I would propose, because at the moment, all we care about is P existing and N existing. So, who really cares what else is going on? And frankly generally I am not going to care that N is at address ox123 just that it is at an address that happens to be ox123. And so the way a programmer or computer scientist when talking about design on like a whiteboard or frankly in sections and office hours on a whiteboard we rarely care what the actual addresses are. So we generally abstract the specific address away and literally represent pointers with arrows on the screen or on the whiteboard or the like. This just means that P is a variable that points to the number 50 in memory. Okay. Questions on this mental model for what a pointer is. It's a pointer in like very much the literal sense. [snorts] Okay. So, if you're on board with that, let me propose that we consider now um what these things look like maybe more physically. In fact, we've we've got a couple of mailboxes here to make clear with a little metaphor that uh here is a physical representation of our variable say P labeled as such. Inside of this is presumably going to be the address of some actual value. That value at the end of the story is going to be the value of N which recall for consistency is that address ox123. So what happens when you actually try to uh locate a value in memory is analogous to sort of looking up something inside of these mailboxes which if you think of your computer's memory as hundreds or thousands of little mailboxes maybe more apartment style where you've just got rows and columns of mailboxes as opposed to individual ones for single family homes. Each of those mailboxes can contain the address of some value in memory. And so what's really happening is that if this is P, not drawn to scale because they only make mailboxes so large. Inside of P is going to be an address like ox123. And just to be dramatic since there's a big football game this weekend, uh here is a Harvard foam finger metaphorically like this pointer is like pointing at that value over there. And in fact, we're going to see as you asked a moment ago, can we actually go to an address in memory? We don't yet have the syntax for that, but we're about to. Yes, you can. And in fact, if I follow what I'm pointing at, open up this location in memory, voila, there is the 50 in question. So, anytime we're talking about values or we're talking about the addresses thereof, you can think of it analogously as being like physical mailboxes, one of which might contain a useful number like 50, one of which might contain the address of that value. And we now have the syntax we'll see to actually go from one to the other. Let me actually go back into VS code here which in the most recent version of my program what I was doing was getting the address of N and storing it in P and then I was literally printing out P itself and that's when we saw the big hexodimal number that is generally not useful but it's maybe interesting to see that one time. Let me instead though introduce another use of that star or asterisk operator that allows us as was asked a moment ago to actually go to that address. So in this version of my program, I'm going to keep N equal to 50. I'm going to keep P equal to the address of N. But what I'm now going to do is show you how syntactically I can print out not P, but N, but by using P, following the proverbial uh foam finger metaphor by printing out percent I back slashN and printing out N instead. Now, obviously, I could cheat and just say N and print out N like in version one, but that doesn't really demonstrate anything interesting here. However, if I only have P at this point in the story, it turns out you can use the star for another purpose. If you simply prefix your variable name with a star, that is the so-called now dreference operator, which means go to the address in P. So if I now open up my terminal here, do make addresses for this version, then dot / addresses and enter, I now get back the number 50. So what's really happening in line five, as has been true for several weeks now, we have a variable called n being initialized to the number 50. Then on my next line six, I'm declaring p as an address of some value, an integer specifically, and putting the address of n in there exactly. And then on line seven, I'm actually saying print out an integer percent I as we've done for weeks. But what integer? Go to the address in P and print out what you find there. So that's equivalent again to the the foam finger which is over there pointing at the address I actually want to point print out instead. Okay. So usefulness. Well, I think we can get there by taking a look at one of our little white lies that we've been telling. In fact, let's turn our attention to strings, which up until now have been a sequence of characters in the computer's memory. A string is a thing in programming more generally, but in C, it technically doesn't exist by this name. But you can still use strings in C, but just not by calling them str iing as the actual data type. But let's let's start with our familiar code here. Let me go into addresses.c. Let me add our trading wheels in for now and include cs50.h because in this version of my addresses program, what I want to do is declare a string s and I'm going to set it equal to high exclamation point. Then as we did in week one, let's go ahead and print out with percent s back slashn that value of s. So nothing new, nothing interesting here. So let me just do it quickly and do make addresses then dot /resses and we see hi on the screen. So that has all been something we've been taking for granted. But let's consider what is going on underneath the hood of even that program. So the string we've declared in memory exists somewhere in the computer's canvas of memory. So string s equals high might end up somewhere down here. And I'm going to stop drawing all of the boxes when not necessary. But here we have hi exclamation point. And as we discussed two weeks ago, the null character and ul which just means the string stops here. So as a quick refresher, even though the word is three characters, it takes up how many bytes? Four. always because you need that null terminator. All right, so maybe that string could be accessed then by its name S. And we've seen this before. S bracket zero is the first character. S bracket 1 2 and then if you want to poke around, you can go into S bracket 3, but you'll probably see quote unquote null on the screen or the compiler will sort of the computer will sort of remind you that you don't really want to look there at that point. So, three characters accessible via this array syntax. But we know now that everything in the computer's memory is addressable. And maybe that H just so happened to end up at ox123 and the i ends up at ox124 125 126 respectively. Doesn't matter what these numbers are, but because strings are sequences of characters back to back up to back in memory, it must be the case that these addresses are themselves contiguous back to back to back without gaps inside of them. That's how a string has always been stored in memory. It's just an array of characters. All right, so with that said, what really is S? We've thought of S in every program we've used strings in before as just a string. Like that is the sequence of characters or really it's the name of an array. But that's a bit of a white lie because what S really is is going to be a more specific value. Take a guess what is actually going to be the value in S. >> Yeah, the address of if I may that array. So we've got like sort of four possible answers here. A, B, C, and D. Multiple choice. Which of those numbers probably makes sense to store in the variable called S in order to get to this string? What what is S's value? Yeah. >> 0x123 is correct. So we don't talk about this in like week one because like it's already hard to like remember semicolons in week one. Like god forbid start thinking about like what these specific addresses are. S is a string. S. But technically S is and has been since week one a pointer. The address of an array of characters in memory. The address specifically of the first character in memory which is sufficient. Why? Because of this null terminating convention that we talked about weeks ago that tells the computer where the string ends. The pointer tells the computer where the string begins. And that's how you get using just numbers, zeros and ones inside of a computer to store something as interesting as an actual string. So in fact, let's make let's take a closer look at this. In fact, let me go into uh VS Code again and just for the sake of discussion, let me declare S as before, but instead of printing out uh the whole string at once, let's go ahead and do this. print f uh quote unquote percent p back slashn and then let's print out s itself initially to see whether it's actually o x123 or presumably a much bigger number then after that let's print out another pointer another address rather percent p back slashna and now I'd like to print out the address of the first character of s but let's let's not get ahead of ourselves let me go ahead and make addresses n dot /resses. Okay, there now in this high program is the address at which the string itself is stored. ox 5a7143027004. So bigger than ox123. Well, let's now poke around. What if I were to do this? What if I want to print out the address of how about the first character in that string? Well, at the moment, recall that s bracket zero is literally the first character. That is a char. So with what syntax could I get the address of the first character? Well, we haven't learned all that much that's new today. It's just a single amperand that will get me the address of that character. If I do this for the next character, I can see one after another. And in fact, this is going to have four characters in total, including the null character. So let me copy paste, which is generally frowned upon, but not for a lecture demo because we're just trying to do this quickly. Let's print out the address of S itself. and then more specifically the address of S's first character, the address of S's second character, third, and the address of that null terminator. All right, let's go back into make addresses. Let me go ahead and clear my terminal and dot slash addresses. And we see if I zoom in on my terminal here, the following. S itself contains ox 56199 bd00004. And the address of the first character in S, aka S bracket zero, is exactly the same thing. The next character, the I in high is one bite away. The exclamation point is one more bite away. And the null terminator is one more bite away. So again, bigger numbers, but the point is these are indeed just the actual addresses of all of these characters in memory. All right, let me pause for any questions here. Yeah, >> why do you need a reference specific but not S? >> Good question. Why do I need the amperand before the specific characters in S but not S itself? Think what S actually is. I'm claiming for the moment that S itself is the address of that whole string which just so happens by design to be equivalent to the address of the first character because that is the convention humans came up with decades ago to represent a string. Now you might think that you need the address of every character in the string. But no, that's why humans decades ago decided to just terminate every string in memory with the backslash zero or null terminator because if you give me the beginning of the string and the end, I can obviously with a loop find everything else in between. Other questions? No. All right. Well, what is then this actual thing in memory? Well, it turns out that S is yes, a string as we've been describing it. It turns out that yes, S is a string as we've been describing it all this time. But technically, I think we're ready to reveal what little white lie we've been telling or if you will, what abstraction S actually is in the CS50 library. The type you know as string since week one all this time has simply been a synonym for char star s this is where maybe so what does this really mean well we saw instar p earlier here we're seeing char star s but what does that really mean well s is the name of the variable and yes it's a string but what is it really s is the address of a char and so in week one of the course in the actual CS50 50 library. We've told this little white lie by just creating a synonym in the library that makes char star so to speak the exact same thing as string s t r i n g just so that we don't have to think about this level of detail let alone hexodimal notation and addresses and pointers and dreferencing and all of this complexity in the first weeks of the course. It simply abstracts away what the char what a string actually is. And in fact we've seen this technique before in a more complicated way. In fact, if you recall a couple lectures uh last week, we actually claimed that you could create a phone book for instance using uh persons and persons have names and numbers and we created our own type by saying type defaf and that type was a whole structure which is the complexity part a structure containing a name and a number and we gave that data type ultimately the keyword person. So we've already invented in class our own makebelieve data types to create things that didn't come with C itself like a person. Well, the strruct is very specific to what we were trying to do with the phone book, but typed defaf is more generally useful because it literally allows you to define your own type. So, for instance, if we wanted to create an synonym for int because we never remember what it is and call it integer instead, you could simply say type def int. And that would create in your programming environment a data type called integer that is literally equivalent to int. Now, this is not all that useful. So instead in the CS50 library, we do use typed defaf to tell the computer that charar should instead be spelled as string semicolon. And that just means that string ever after is the same thing as saying char star. So all of this time since week one, I could have been doing exactly that if I wanted. And in fact, if I go back to VS Code here, let's simplify this quite a bit and go back to the very first version of the program wherein I use percent s and just print it out s is value itself, the string high. Well, this of course is going to work as always as follows. It's just going to print out high on the screen. But now, if I get rid of the CS50 library and try to recompile this, notice we'll get an error that I think I've seen before. Here we have if I scroll up to the very first line use of undeclared identifier string did I mean standard in and no I don't and no I didn't a couple weeks ago when I accidentally did that but it the compiler does not know about the keyword string at the moment. Well that's fine even if I don't have the CS50 library installed on this computer. I can just get rid of the word string which is a concept but not a keyword in C and just rename it to char star. And now in my terminal window, I can do make addresses again, dot slash addresses, and voila, we're back in business with no CS50 training wheels whatsoever because printf knows given a char star, go to that address, print, print, print, print until you get to the null terminator, and then stop printing. There's a loop in there that does exactly that. questions on char star or what a string actually now is. >> Yeah. In front. >> Good question. How does print f know to keep going until it gets to the null? the format code because I've been using percent s which means print a string instead of percent c which means print a single character print fc is that percent s and it was like oh I should use a loop to print out all of the characters until the null terminator if I instead passed in just percent c it would stop after a single character >> okay that makes sense >> other questions >> good question why Why don't I dreference S in order to print it out? So, let me try that for just a moment here. Why do I not have to now or any week prior do S here? Because after all, if S is the string, I want to go to the string and print it out. Well, the first answer is that print f is doing this for you because it's being handed the address and it is going to the address for you. So, that star is somewhere in print f's implementation. But this is also incorrect conceptually because yes s is the string but more technically today s is the address of the first character in the string. So I really want to provide print f in this case with the address not the specific character because I want it to treat it as a string not a single character indeed. So I could use the percent s if I change to percent uh I could use star s if I change to percent c to print out the single character. All right. So let's play around just syntactically for just a moment here in VS code. Let me propose that we still use charst star s here and then just demonstrate exactly what's going on. So I'll do exactly what was just asked. So I'll use percent c and then I'm going to go ahead and print out for now our old week 2 syntax treating s as an array. So s bracket zero, s bracket one and s bracket 2. And I'm using some copy paste just for time sake. This of course is not going to do anything all that interesting, but it is going to demonstrate that indeed we have h i exclamation point back to back to back in memory. And if I really want um I could print it all on one line by getting rid of of course those new lines. But what more can I do with this syntax? Well, I could take literally the fact that s is the address of the first character in memory. So instead of using this array notation which we introduced in week two, I could technically go to the address of S. Why? Well, S is the address of the first character of the string. Star S means go to that address. And voila, you're at the first character by definition of what S is. So I could print out the first character using star S instead of S brackets zero. How could I do this? Well, here's where we can actually take advantage of the fact that pointers and addresses more generally are in fact numbers and you can actually do arithmetic on pointers themselves. In other words, there is a concept known as pointer arithmetic which means given an address, you can add to it, subtract to it. Heck, you could even multiply or divide. Even though that would probably be weird in most cases, we could certainly add numbers to an address. So for instance, if I want to print out the second character of S, that's kind of equivalent to going to S but then moving over one character. So maybe I should do a little bit of pointer arithmetic and do S + 1 in parenthesis just so that like in math class we uh do order of operations correctly. And then down here I could go to S again. But wait a minute, I want to go to S plus two characters away or two bytes away. So now I can do make addresses down here. Oh, and I did mess up. Oh, new mistake. Unintentional. Yep, I forgot my parenthesis on the very end here. So that was just user error. Make addresses again dot sladdresses. And now I indeed see h i exclamation point one more time using pointer arithmetic instead of our familiar array notation. So what is that array notation? It's what we would generally call syntactic sugar, which is a very weird way of saying like it's just nicer syntax. Like no one wants to write code that looks like this. It sort of, you know, bends the mind a little bit to read and parse all of this visually. Just s bracket zero is much more straightforward. But what it's really doing is this. And the computer is essentially converting that bracket notation for us into this more esoteric but correct version instead. All right. What else can I do? Well, just for fun, for some definition of fun, let's go ahead and print out three different strings. And recall that a string is a sequence of characters that starts at some address. So, let's first print out the sequence of characters that starts at s. Let's next print out the sequence of characters that starts at s+ one. And let's lastly print out the string that starts at s+ 2. Just playing around with the definition of what these pointers are. Let me do make addresses. And oh, not my day. What did I forget? Semicolon. So if it happens to you, it happens to me, too. Make addresses dot sladdresses. And now this one's going to be a little curious. But I see hi I and just exclamation point. Why? Because I'm treating a string literally as what it is, a sequence of characters, but I'm giving print f the address of the first character initially, then of the second character, then of the third. But all three of those statements work because all three of them happen to be terminated by the same null character. Even though I and the exclamation point alone was not really my intention, that doesn't stop me from being able to do it nonetheless. All right. Well, let's do one other maybe uh application of this idea. Let me propose that. Let me propose that we take a look at our computer's memory here and let's suppose that we want to start uh comparing values because in week one we did a lot of that and we even in week zero we did a lot of that with if and else if and else and so forth. So let's make this a little more real and also reveal why last week we had to solve a unexpected problem using another string function namely stir comp str cmp. So here for instance are two arbitrary variables in memory I and J and I gave them both the value of 50 and maybe they indeed end up there each of them taking up four bytes. Last time recall that we weren't able to compare two values in memory just by using the equal equal operator unless those values last time were actually integers. In fact let's do that. Let me go back into VS Code here. close out addresses and let's code up maybe another version of my compare program from last uh from the past. This time I am going to use the CS50 library just to keep things simple initially. I'm going to include both it and the standard IO library here. I'm going to give myself main with no command line arguments. And then in main I'm going to declare exactly what we just saw on the screen. A variable I set to 50, a variable J set to 50. And then we're going to do our old familiar syntax from week one. If I equals equals J, then let's go ahead and print out something like same back slashn. Else, let's go ahead and print out quote unquote uh different back slashn. So super simple program that simply compares two variables that yes are obviously going to be the same, but let's do this. So let's do make compare dot /compare. They're in fact the same. Okay, so that actually works as intended. But why didn't it work last time when we tried comparing strings? The solution to which was actually to introduce stir comp. Well, let's go back to VS Code and resurrect that buggy example initially. In fact, let me go into VS code here and instead of using say integers, let's go ahead and do this. And I'll rename them just by convention. So my first string will be quote unquote uh let's do my first string will be whatever get string gives me. So we'll prompt the user for s. My next string will be called T by convention and I'm going to ask the user for that. Then down here, instead of using I and J, which are common for integers, I'm just going to use S and T, which are common for strings, and just ask literally the same question as we have in the past. All right, let me go ahead and do make uh compare and wow, what's the error? Well, I'll show you the error message. What did I unintentionally do wrong here? Yeah, I'm getting a string, but I'm trying to store it into an int. So, this is just frowned upon. So, let me go ahead and change that to what I should have typed the first time. Give me a string s and a string t. Now, if I do make compare, we're back in business. All right, let me do dot /compare. And I'm going to go ahead and type in, for instance, uh let's say hi exclamation point and high exclamation point, both for S&T, which are obviously clearly different. Now, we've tripped over this before and recall that the solution was indeed to introduce a function called stir comp. And I explained at a high level. Well, that's because you're not just comparing two values. You got to compare character after character after character. And that's what indeed stir comp does. So, let's go ahead and do that. Let me go back into this file. Let's go ahead and include the string library at the top here. And instead of doing s= t, let's do if the string comparison of s and t happens to equal equals zero, which per the documentation for the function means they're equal instead of one before or one after the other. No, I did not get it wrong this time. I caught it. Um, yes. So, how do we actually go ahead and compare the strings this time? Well, let me go ahead and do make compare dot /compare. And now type in exactly the same thing. Hi exclamation point. Hi exclamation point. And now they're in fact the same. And just to demonstrate that this isn't just some fluke, I can type in hi for instance and buy. And those are in fact different. So clearly stir comp is doing something useful. But what is it actually doing? Well, first of all, let's make clear that what was a string last week is technically a char star this week. So I can remove that training wheel. I'm still going to include the CS50 library because as we'll see by the end of class today, get string and get int and all of those get functions from CS50 are actually still useful because it's a pain in the neck in C still to get user input without using functions like those. But I'm going to get rid of the data type that we thought was called string. This will still work exactly as before. If I do make compare dot /compare and type in high and high, we're indeed seeing that they are now the same. So, what's actually going on inside of the computer's memory with strings? Well, I would offer that S probably ends up like over here in memory. And then maybe it actually has its characters down here. So, notice the duality. S as of now, is an address, which means it takes up eight bytes or 64 bits, but the actual characters, it turns out, end up somewhere else in the computer's memory. And this is what's different about an int. The int i and the int j both ended up exactly where the variables were named. But with strings, the variable itself contains not the string, but the address of the first character in that string, which I claim could end up anywhere else in the computer's memory. So that those addresses might be ox123, 1 124,125, and 126 for instance. Meanwhile, S is going to contain literally the address of that first character. When I create T in memory now, it ends up maybe over there taking up eight bytes of its own down here ends up the second thing that I typed in not at the same address but at ox456 457 458 459. Now if the computer were really smart and generous, it could probably notice, oh wait a minute, you typed that thing in already. Let me just point you at the other memory. But that's not how it works. When you call get string, you get your own chunk of memory for whatever the human typed in. Even if by coincidence it's exactly the same. So T's characters are ending up here. S's characters are ending up here. What value should go in T? >> Exactly 0x456 because that's the first uh address of the first character in T. So we put ox456 there. So at this point in the story, we have two strings in memory and two pointers there too. And so in fact, if we kind of abstract that away, it's kind of equivalent to S pointing at the chunk of memory on the left and T pointing at the chunk of memory on the right. So why was string comparison actually necessary? Well, in this case, we wanted to make sure that the stir comp function was handed the address of S and the address of T. So that the stir comp function written by someone else decades ago actually has its own for loop or while loop that essentially starts at the beginning of each string and compares them character by character by character by character. That's what it's designed to do. By contrast, when I was using equal equals a few minutes ago and also last week incorrectly to compare strings, what was getting compared? Well, if you literally compare s= t, that's like saying, does o x123 equal equal ox456? And that's obviously not true because those are literally two different addresses. So, the answer I was getting last week and today was correct. Those addresses are different. But conceptually of course I actually intended for the program to compare the actual characters in the string not the uh simply the addresses thereof. So how do we go about fixing something like that? Well using stir comp ensures that we can actually go ahead and compare them character by character and I don't need to create my own for loop or y loop. The stir comp function does that for me. And we can see this too. If I go back to VS Code here, get those two strings and just for kicks, go ahead and print them both out using print f of percent p back slashn. Then let's go ahead and print out with percent uh p again back slashn for each of them passing in those variables s and t respectively. What I should see that even if I type the exact same thing, we're going to see two different addresses when I make this version of the program. Here's my first high. Here's my second. And the two addresses are it's subtle very much different. The first one ends in B 0. The second one ends in F0. Both of which are hexadimal values. Question on any of that thus far? Any qu? Oh yeah, question in front. Yeah. What's that? >> Really good question. When you create a pointer in memory or really when you allocate a string or an integer in memory, how does the computer decide where to put it? It uses different chunks of memory for different purposes. And in fact, one of the topics we'll look at after break today is exactly that. How a computer decides where to lay things out. It's often very intentional and it is often auto incremented. So they'll go back to back to back when possible, but over time things will start to get messier, especially in larger programs where you're adding and subtracting values from memory all the time. So more to come. Other questions on what we have done here. All right, before we break, let's do one other example that elucidates perhaps what can go wrong without understanding some of these underlying building blocks. whereby let's go ahead and create a program this time that aspires to copy two strings, which seems pretty reasonable at a glance because it's certainly easy to copy two integers. You just set one equal to the other, but that's not going to be the case, it turns out, with copying a string. So, let me open up how about uh copy C, a new program, and I'm going to include a few libraries at the top. We'll use CS50.h so that we can still use get string conveniently. We're going to include uh cype.h for reasons we'll soon see, but we saw that a few weeks back. We'll include standard IO as always. And lastly, we'll include string.h inside of my main function, which won't take any command line arguments. Let's go ahead as before and declare a string equal to get string and just prompt the user for a variable s. Then let's go ahead and try to copy uh s into a new variable t just like I would copy any two variables using the assignment operator. Then let's treat the copy otherwise known as T now as an array which we're allowed to do per week 2. So let's say the first character in T we actually want to set equal to the uppercase version of that same character. So this line 12 at the moment is literally on the right hand side saying use the two upper function from the cype library which we used a couple weeks back. Pass in the first character of the copy T and then update the actual first character of T. So let's capitalize T but not S. Now at the very bottom of this program, let's go ahead and print out the value of S at this point in time. And then let's print out the value of T at this point in time. And when I go ahead and make this program called copy and dot /copy, let's type in high exclamation point. Uh no, let's do it lowerase first. Let's do high in lowercase. Enter. And we'll see curiously that S and T both got capitalized even though the only character I touched was T bracket zero. I didn't touch S after making this copy. Now to be clear what's going on? Why don't we remove one of these training wheels? So string really doesn't technically exist. It's always been a char star. And this string is also a char star. So what's really going on? Well, more clearly now S is the address of the string uh that the human typed in. But T is a copy of what? Literally the address of the thing the human typed in which is going to be one and the same. So in fact pictorially you can think about it this way. If here is my canvas of memory and the user is prompted for S and the user types in high in lowercase as I did and it happens to end up down there. what gets stored in S is going to be the address of that memory which for the sake of discussion is maybe ox123. So ox123 is what is stored in S. When I then on my second line of code create T, I get another eight bytes of memory or 64 bits to store a pointer charar aka string. But what is put in S? What is put in T? Literally S o X123. So abstractly it's essentially equivalent to S and T both pointing to the same chunk of memory. So when I do t bracket zero and go to the zeroth or first character of t, that happens to be the exact same chunk of memory that s is pointing to. And so when that lowercase h becomes a capital h, it's as though both s and t have changed. And recall too, if you're enjoying the syntax, if I go back to VS code here, I did use array notation, but I equivalently could have said go to the address in t. go to the address of that first character which functionally is exactly the same. We're just not using the syntactic sugar now of the square brackets. That is why hi is actually being capitalized for seemingly both versions of it. The original and the copy. So how do we go about fixing this? Well, we need a couple of new solutions, namely two new functions here. Maloc is going to be a function that allocates memory. So memory allocation aka maloc. and then free which is going to be the opposite which is when you're done with new memory you can hand it back to the computer and say use this for something else. So using these two functions alone I dare say we can solve now this problem in memory by making an actual conceptual copy of the string by copying hi exclamation point and the null character elsewhere in memory so that we can actually manipulate the copy thereof. So how do I do this? Well, let me go back to VS Code here. Let me propose that we get rid of much of what we did earlier except we'll keep around the declaration of S. But now if I want to create a copy of S, it turns out I'm going to need to ask the computer for as much memory as S itself takes up. So hi exclamation point takes up how many bytes in memory? Four is correct because you need the null character. So how do we figure this out? You can do this. Let me give myself another string called T. But we don't need that white lie anymore. Another char star called t and set it equal to not s which we knew was going to go wrong. Set it equal to the return value of this new function maloc which is going to return the address of a chunk of memory for me. How many bytes do I want? Well, technically I just want four bytes. So I could do maloc of four. And that will literally ask the operating system running in the cloud in VS Code for four bytes of memory somewhere in that black and yellow grid I keep drawing on the screen. I don't know where it's going to be, but I don't care because Maloc's return value will be the address of the first bite thereof. Now, it's a little dumb to hardcode four, not knowing what the human's going to type in, but that's okay. We can do this more dynamically and use our old friend Sterling, ask the computer, what is the length of S? and then add one because we know that we need to additionally have an extra bite even though the length of high in the real world is three but we know underneath the hood we actually need that fourth bite hence the plus one. Now to use maloc I actually need to add another library here standard lib for standard library.h and that's going to give me access to the prototype for and in turn the maloc function. Now with this chunk of memory, it's up to me to copy the string. So how do I go about copying a string from S into T? Well, I can do this in a bunch of ways, but let me propose that we do it like this. For int i equals zero, i is less than the string length of s, whatever that is, i ++. And then inside of this fairly mundane loop, let's just set the uh i value of t equal to the i value of s and copy literally very mechanically every character from s into t. Then down here, let's go ahead and capitalize just the first character of t by using two upper as before with or without the syntactic sugar. And then at the very bottom of this program, let's print out the value of S itself just for good measure to make sure we didn't screw it up this time. And let's print out the value of T just so we see that I in fact have capitalized T and only T. But I'm not quite done yet. There's a design flaw here and a mistake, but it's subtle. Does anyone want to pluck off one or the other? Check 50 and design 50 are not going to like this. Yeah. We don't actually pop over the like terminating character of the string. >> Yes, because Sterling always returns the sort of real world length of the string. Hi exclamation point 3. This would seem to accidentally forget to copy the null character. So I can fix this in a few different ways. I could for instance at the bottom of my loop actually do something like t bracket 4 equals single quotes back/z and manually terminate it myself because I know it's got to end with a null character. This would be frowned upon too. I shouldn't be hard coding the four. This is all too sloppy. So don't do this. What I could instead do is say go up to and through the length of S because if the length of S is three, but I use less than or equal to that thing's going to iterate of course four times because I'm starting at zero as always. So that I think fixes that problem. But now the design flaw which is subtle but we've seen it before. Yeah. Exactly. It's just dumb of me to be asking the computer what's the length of s what's the length of s what's the length of s and every iteration. So this is why we introduced this trick where you can set another integer variable like n equal to that string length and then after the semicolon just keep comparing i against n which means you're not calling functions wastefully as before. All right if I didn't mess up anything else let me go into my terminal. Let me do uh oh did I mess something up? I still Yes, I did mess something up. I should have put this back as well. Thank you. All right. So, let's go ahead and do make copy. Enter dot /copy. And now I'm going to go ahead and type in hi in all lowercase and hit enter. And you'll see now that s is unchanged. It's printed out again in lowercase, but t is in fact capitalized here. Now, why is this? Well, in this case, what's happened is that I've got S in memory, but this time when I allocate T, I then use Maloc to get a whole chunk of memory here that initially just contains who knows what garbage values as we've called them before. I'll just leave them as blank here, but it happens to be for the sake of discussion at ox456 7 8 and 9. When then I actually set t equal to the return value of maloc, it's as though t is just pointing to this chunk of memory. Then in my own loop when I go from zero on up through n that just means to copy the h then the i then the exclamation point and because of the equal sign also print uh copy the null character instead. So this is getting a little tedious though admittedly like this is a lot of work just to copy a couple of strings. Could we be doing this a little bit better? So we actually can because of the libraries we're including. Turns out there's functions for copying strings that come with C. So in fact if I go back to VS code here I don't actually need any of this for loop here so long as I have actually allocated enough memory for this string which I do think I've had. I can actually use literally a function called stir copy strcpy for short and pass in the destination and the source in that order. Almost feels a little backwards but that's the way it's done to copy s's bytes into t. It's easy to mess them up, but don't mess them up. Per the documentation, the destination comes first and then the source string instead. So, if I do this now, let's do make copy. We're good to go. Uh, if I do dot /copy now and type in high and all lowercase, we still have preserved that good property. But let me propose that things can go wrong. And in fact, this is about to make the program look way more complicated than feels ideal. But I've been a little lazy here. There's a bunch of things that can go wrong for which it's worth knowing about the return values of these here functions. So all of this time it has been possible for certain functions we've been using get string among them to return confusingly this null value null. Again humans decades ago decided that one would be called null. Other humans decided this new thing would be called null. N UL pronounced null is just the null terminator back/zero. It is a single bite of eight bits all of which are zeros. That's been true for a few weeks now. NL happens to be a special memory address literally ox0 at which nothing is supposed to ever live. So whenever I describe the top left corner as this is address zero, this is one, this is two. Humans years ago decided, you know what, let's just waste bite location zero and never put anything there so that we have a special value to ensure that we can signal when something has gone wrong. So humans just decided don't use memory address ox specifically and a few bytes after it. So what does this mean? Well, in my code all this time and since week one, frankly, things could have gone wrong. So in VS Code here, I'm using get string and I'm using Maloc and I'm using stir copy and um all of these print statements here, but I'm not actually adding as many error checks as I should. So it turns out if you read the actual documentation for get string, which in fairness we never told you about until now, in cases of error, get string can return null. Why would it ever have an error if the human types in such a large paragraph of text maybe that there's no room in the computer's memory for everything they've typed in? Well, you don't want to just get back part of the text and not know that something went wrong. Get string is designed to return a special sentinel value null in all caps. That just means I can't oblige. I can't return you a correct value. Here's an error instead. So what I should always have been doing since week one but we consciously don't because it adds just too much overhead is check if s equals equals null then we should abort the program altogether and for instance like return one as we've done before to just signify error like we cannot proceed because get string did not work that is true of maloc 2 technically we should say if the address in t also equals null that is ox0 we should also return one because something uh went wrong. So, let's do this one more time. Turns out that even two upper is taking for granted the fact that the humans typed in anything at all. What if the human just types enter? Well, that's a valid string. It's the so-called empty string, quote unquote. But what is the length of nothing? It's going to be zero. And that's problematic because if you try to go to T at the first location, what is actually there? Well, that's actually the null character, which is not something you should even try to capitalize, it would seem. So, what we should really do here, too, is check only if the sterling of S is greater than zero should you even bother uppercasing that first character. I mean, one, at best, it makes no sense because if there's no string, there's nothing to uppercase. At worst, I could break something by touching memory that I should not. And if I may, there's another issue. Now, on line 15, I'm asking the computer for memory, and it's going to hand me those four bytes. But technically, I'm never giving them back. And so, even though this program is so short that it's going to quit pretty soon, and it's not a big deal, the computer will automatically reclaim that memory in longunning programs that like servers or things that are running for a long time. If you use Maloc and ask for memory, but never give it back to the computer, never free it, so to speak, your computer might get slower and slower and slower and slower essentially because it's running out of memory. Not physically, but the computer thinks it's using all of its memory even if it's not actively in use. You as the human know best. And so at the end of this program when I am completely done with T, you should similarly call free of T passing in the address that you allocated previously so that the operating system gets that memory back. If you don't do that, it's what's called a memory leak. If you've ever used a Mac program, a Windows program, an iPhone or Android program that somehow is just getting slower and slower and slower and slower, that is often a symptom of a human having messed up and not freeing memory that they don't actually need anymore. Questions on null or any of these kinds of checks? No. All right. Well, as a teaser, in just a bit, we're going to reveal when and why things can go terribly wrong by way of a little bit of claimation from our friends at Stanford, but feels like we're long past a good uh snack break. So, why don't we go ahead and have some oranges and some fruit snacks, and we'll see you in 10. All right, we are back. So, with memory, a lot of things can go wrong. And in fact, a question came up during the break about whether or not I should have also called free on s, which was the string that I actually got back from get string. The short answer is no. This has been a deliberate choice over the past several weeks whereby the implementation by CS50 of get string automatically frees memory that it has given to you once it is no longer needed. So that's a bit of magic underneath the hood once those train once you no longer use that though that feature goes away. But because I actually used maloc to get my memory for t I did have to free that specific memory. So the rule of thumb quite simply is if you maloclocked it you must free it. If we get string malocked it, you do not have to free it yourself. But of course, things can go wrong. And thankfully, there are tools via which we can find memory related errors. And one thing we're going to show you briefly is another tool called Valgrren, which is a nice complement to something like debug 50 and print f and the duck for actually chasing down specifically in this case memory related errors. So in fact, let me go over to VS Code and open up a program I wrote in advance because it's just not all that useful, but it is demonstrative of some things that can go wrong. And in memory.c we have this code here. We include standard IO.h and we include standard lib.h the latter of which recall is necessary now when you want to use maloc and in turn free. And inside of this main function I'm doing a few things. I am first allocating three integers in kind of an interesting way because it turns out that maloc takes as its argument the number of bytes that you want to get. Now I know on most systems an integer is indeed four bytes. So if I want space for three integers, I could just do 3 * 4 is 12 and put 12 inside the parenthesis here. But that's generally frowned upon because it would make my code less portable to other systems where an int might not be four bytes. So turns out you can use this operator size of and actually ask the computer how big is a data type like an int on this specific system. And for chars you'll always get back one. For ins usually get back four. And same goes for other data types as well. But this is the more dynamic way to ask that question. If you want to get three uh integers worth of memory, what I'm then going to do is assign on the left hand side the return value of maloc to this variable x just because and x itself is a pointer to an integer more specifically to this chunk of memory which is a sequence of three integers. This is very arbitrary and this is only meant to demonstrate things you can do incorrectly ultimately. But this is how I would dynamically get space for three integers from maloc and store the address thereof in x. So it stands to reason that I could put my first value at uh x bracket 1 equ= 72, my second value uh equaling 73 and my third value equaling 33. Now if some of this is rubbing you wrong, like these are actually there's riddled with mistakes already, some of which are old to us. What's the first thing I've done wrong? Even if you have no idea what's going on with line eight, what about lines 9, 10, and 11? What I do wrong? Yeah. >> Yeah, my indexing is wrong. Like we've known for weeks now that with arrays or with array syntax, you always start counting at zero, then one, then two, not one, two, three. So that's an issue. And this is a new detail. But given that I've used maloc on line 8, what other mistake have I done in this version of the program? What's missing? Free. So I didn't actually call free. So this program has a memory leak. It's asking for memory and never handing it back. Now that's pretty good. You know, a few of us were able to just kind of eyeball the code and debug it. But that's not going to be true for all people, all programs, certainly when the programs get larger and more complicated. So a program like Valgrren's purpose in life is to help you spot these kinds of errors. So for instance, when I run make memory to compile this program and then do slashmemory at a glance, like it actually seems perfectly fine, if only because I'm not seeing any me errors even when I compile it or when I run it. But we I do claim that there's at least two that we've seen here. It's just we're not getting so unlucky that the program is actually crashing as a result. So this is a more latent, harder to detect bug. But what I'm going to do now is this. I'm going to open up my terminal window in full screen. I'm going to then do Valgrind space memory so as to run the Valgrren memory checker on this program. So similar to debug 50, but the name now is Valgrren. This isn't a CS50 thing. This is a common program that programmers use. When I hit enter, the output's going to be atrocious, frankly. Um it's more way more complicated than it needs to be. They put this number here, which means something specific, but it's just stupid that it's on every line of output. So it's overwhelming at a glance. But once you've trained your eyes to look for useful information, there's a couple of useful insights here. So one, invalid write of size 4 that apparently is somehow related to line 11. So let's go there. Let me just minimize my terminal window, look at line 11 of memory C, and just see which line that was. Okay, invalid write of size 4. Well, writing means like changing a value. Reading means accessing a value. So they're sort of opposites. invalid write of size four. Well, here's why it's generally useful to know generally how big an int is. Like four, you're trying to write four bytes incorrectly. So why is line 11 invalid? Just to be clear, because the index is off like I'm touching memory that I should not. If I ask the computer for space for three integers, each of which is four bytes, that should give me location 0, one, and two, not location three. So you still have to know a little something about programming to be able to make good use of that information invalid right of size four but once you've sort of trained your mind and your eye to catch it like h now I'm an idiot I have to go in and fix that problem but what else is wrong based on valgrren's output here so this is kind of worrisome leak summary definitely lost 12 bytes in one blocks I don't really know what one blocks means for now but 12 bytes should be familiar because if you generally remember that an int is four bytes and you ask or three of them. Oh, there's my 12. So, somehow I'm losing 12 bytes of memory. Not in a literal sense, but it means by the time the program finishes, you have not returned or freed all of the memory that you asked for. So, this line here is your hint that you've done something wrong with respect to 12 bytes in total. And sometimes you'll see slightly different output here. For instance, we see mentioned up here, 12 bytes and one blocks are definitely lost in loss record 101. Very verbose. But the juicy part is ah on line 8 is the source of that error specifically. So there too it's a little bit of a breadcrumb leading me to the solution for fixing this. So if I go up here, I look at line 8. Okay, there's only so much that I could have done wrong on line 8. If I've maloced the memory on line 8, sounds like I do need to free it later on. So let's fix both of these problems. The first one is just the indexing issue. Change the 1 2 3 to 0 1 2. Let's then ch fix the second problem by just freeing x at the very end. And just for good measure, this was not caught by Valgrren because it doesn't always happen. But there's one other scenario that could go wrong and it relates to line eight. What should I be doing? >> I am doing an array, but recall that we can use array syntax on chunks of memory. So technically what line 8 is doing is this. It is allocating 12 bytes of memory from the computer just because just to demonstrate how maloc works and it's storing the address of that first bite in a variable called x. The bracket notation is just the syntactic sugar that allows me to change values at x's address. I could alternatively just use pointers and say go to x and put 72 there. Go to x + one and put 73 there. go to x + 2 and put 33 there using pointer arithmetic. But those are identical and no generally, you know, most people would just use square bracket notation because it's just a little cleaner and easier to read and write. Okay, but back to this question. There's still a subtle bug here based on our example just before break. What should you be doing anytime you call maloc and get string and a few other functions for that matter? Did I hear the answer? Checking for checking for null, right? Because if me lock has an error, there's not enough memory for whatever reason, you should not be proceeding to touch that memory because it might be the null address that is 0x0. So what you should really be checking is, well, if x equals equals null, there's no more work to be done here. Let's just return one down here. And only if we get all the way to the bottom should we maybe return zero to signify uh explicitly that there is in fact successful operation. All right, with that said, let's go back down here. Remake memory. No error messages from the compiler. Dot /memory. That too seems okay, but it was fine the first time. Let's now run valgrren. Let me uh maximize my window. Run valgrren dot slashmemory. Crossing my fingers as always. And now this is actually pretty good. It's much shorter output even though it's just as scary at a glance, but most of this is fluffy and not uh very uh revealing. Heap summary in use at exit zero and zero. So look like all heap blocks were freed. No leaks are possible. Heap is a word we'll come back to, but this means there's nothing wrong. In fact, zero errors, which is a good thing. So in short, Valgrren is among the most arcane programs we're going to use. It's output was really designed for those more comfortable, if you will. But there's still juicy insights there. If you just kind of look for things that lead you to like this file on this line number, odds are that will lead you to the most subtle of bugs. In fact, another type of bug is when we do indeed touch memory, we shouldn't. So, let me uh zoom out on that, clear my terminal, and let me open up another program or maybe write this one real fast incorrectly. So, let me create a program called garbage.c C to demonstrate what we've generally called garbage values. That is values that are still in memory, but I didn't put them there myself necessarily. I'm going to include standard io.h. I'm going to include standard lib.h. And then I'm going to go ahead and actually no need for standard lib this time. Let's do int main void. And inside of main, let's give myself an array of like way too many exam scores or whatnot. We used to do just a few, but let's say there's a,024. Then let's go ahead and do for int uh for int i equals z i less than 124 i ++ and in here let's go ahead and print out uh whoops let's go ahead and print out using print f each of those scores of course I have clearly forgotten to do something in this program which is what I haven't actually put in any scores there for real like I've asked the computer give me an array for 12,024 integers, but I've not used get int or even manually typed in any of my quiz scores, which we did in the past. That's because I'm intentionally trying to show us garbage inside of the computer's memory. What this loop is going to do on line 8 now is literally print out the first int, the second int, the third int, all,024 ins, but all of them should be garbage values because I myself haven't put anything in those addresses yet. So, let's go ahead and make garbage. Let's go ahead and maximize my terminal window just to see more on the screen. Do dot/garbage. It's going to be super fast output because the computer's way faster than,024 variables values alone. There is a lot of garbage output. So when we talk about garbage values in the abstract like here's just some random zeros, a 25, a 32,000, a negative number and so forth, that's because that's essentially remnants from the computer's memory of stuff that might have happened previously, not necessarily by me in this moment, which is to say you just shouldn't touch that memory at all whatsoever. So now we're seeing garbage values for the actual first time. Let's consider another example of a program that uh doesn't contain that does contain potentially memory errors. And let's look at this too. So this is not really a useful program. It's meant to be demonstrative of some of these concepts. So here we have a program takes no command line arguments. Up here we've got a line that pair of lines that declares two pointers but doesn't yet initialize them to any variables. And that's fine. You don't have to have an equal sign with any variable. You just eventually should assign it some value. But this just tells the computer, give me a variable X that's going to store the address of an int. Give me another variable Y that's going to store the address of another int. Okay, what happens next? Well, on this line of code, in this simple example, we're allocating enough space for a single integer just because it's a stupid exercise. There's no reason to do this other than to demonstrate how Maloc works for the moment. Maloc returns the address of that chunk of memory. So that's what goes in X. So X is now pointing at somewhere in memory four bytes of space that it can certainly put a value at. How do we do that? Well, if you do star X and use the dreference operator, that means go to that chunk of memory and put the number 42 there. That's totally valid. This says go to the address in Y and put the unlucky number 13 there. Unlucky quite literally because what is Y pointing to at this moment? It's just the garbage address. Why? Because if you don't initialize Y, who knows what it's going to be pointing to? Maybe it's zero, maybe it's 25, maybe it's 32,000, a negative number, just like we saw in the previous example. You have no idea what values are going to be in X and Y unless you yourself put those values there. So, this is highlighted in red because bad things are going to happen if you try to dreference an invalid or a bogus pointer. Even worse than just touching uh variables that might not have values, if you dreference an address and try going to some random place, the computer is generally not going to like that. And in fact, our friends at Stanford wonderfully brought this particular scenario to life whereby even though this example is a bit contrived just to fit it all on the screen at once, it is going to be the case that bad things happen if we don't check for these values and actually assign valid values in the form of as we'll see now some claimation. So here I give you uh binky uh which is a bit of claimation from our friend Nick Parlante at Stanford. If we could dim the lights unnecessarily dramatically. [music] >> Hey Binky, wake up. It's time for pointer fun. What's that? Learn about pointers. Oh goody. Well to get started I guess we're going to need a couple pointers. Okay. This code allocates two pointers which can point to integers. >> Okay. Well, I see the two pointers, but they don't seem to be pointing to anything. >> That's right. Initially, pointers don't point to anything. The things they point to are called pointies, and setting them up is a separate step. >> Oh, right. Right. I knew that. The pointies are separate. So, how do you allocate a pointy? >> Oh, thanks. >> Okay. Well, this code allocates a new integer pointy, and this part sets X to point to it. >> Hey, that looks better. So, make it do something. >> Okay. I'll dreference the pointer X to store the number 42 into its pointy. For this trick, I'll need my magic wand of dreferencing. Your magic wand of dreferencing. Uh, that that's great. This is what the code looks like. I'll just set up the number. And hey, look, there it goes. So, doing a dreference on X follows the arrow to access its point. in this case to store 42 in there. Hey, try using it to store the number 13 through the other pointer Y. Okay, I'll just go over here to Y and get the number 13 set up and then take the wand of dreferencing and just Oh, hey, that didn't work. Say, uh, Binky, I don't think dreferencing Y is a good idea cuz, uh, you know, setting up the point is a separate step and, uh, I don't think we ever did it. H good point. >> Yeah, we we allocated the pointer Y, but we never set it to point to a point D. H very observant. >> Hey, you're looking good there, Binky. Can you fix it so that Y points to the same point as X? Sure, I'll use my magic wand of pointer assignment. Is that going to be a problem like before? No, this doesn't touch the pointies. It just changes one pointer to point to the same thing as another. Oh, I see. Now Y points to the same place as X. So, so wait, now Y is fixed. It has a pointy. So, you can try the wand of dreerencing again to send the 13 over. Okay, here it goes. Hey, look at that. Now, dreferencing works on Y. And because the pointers are sharing that one point, they both see the 13. Yeah, sharing. Uh, whatever. So, are we going to switch places now? Oh, look, we're out of time. But I can only imagine how long that took, Nick. But the key detail was that bad things happened to Binky when we did this line of code. Dreferencing a invalid pointer that had no true value assigned. It was just some garbage value. Now what's the solution? Well, as Nick proposed, just don't do that. And instead, at least do something sensible like assign X equal to Y. Not to make a copy of anything per se, but to literally point X at the same location in memory to point Y at the same location in memory as X. Then a line like this is perfectly valid. you can go to that address which happens to be the same as the 42 and that's why in the claimation form we saw that the 42 became a 13 instead. So again at the end of the day this is only demonstrative of these basic building blocks that we now have at our disposal but also how easy it is to do things incorrectly. So this is one of those with great power comes great responsibility. C is one of the languages that is incredibly high performing. It's so close to the hardware that you have so much control over the memory and operation that you can write really good, really fast code. And that's why even all these decades later, it's among the most omniresent programming languages in the world. At the same time, you can really screw things up. And so many of today's software that are hacked in some way or crashed for some reason is often because humans have just missed some simple mistake like this that happens to relate to memory. So more modern languages that we'll soon see like Python and if I in high school you studied Java. Uh you don't have this much control over the computer's memory. There's many more defenses put in place to protect you and me from ourselves so to speak. But you pay the price by some of those languages tend to be uh less uh slower and less performant. Yeah. What is the difference here that we're now playing with memory? This will become clear this week and next. And in fact, some of the examples on which we'll end today will motivate needing to have finer grain control over what's going on inside of the computer. When you want to deal with files, for instance, you're going to need to know a little something about memory addresses and where things are. when you want to build structures in memory beyond the complexity of an array. In fact, next week we're going to start building like two-dimensional structures in the computer's memory to represent the equivalent of like a family tree, for instance, or trees more generally that can store data in a more efficient way. Up until now, all we have is arrays. And with arrays, you can achieve something like binary search, but we're going to see there are things you can't do with arrays, especially if speed's important. >> But I I was saying like, for example, if you were to ask me to do this like say last week about this, I would be like x equals like 13 or something like assigning a variable. >> Correct. So last week if you just said int x= 13 or in y equals 42 or whatnot totally fine. And again this program sole purpose in life is to demonstrate how you can make mistakes in and of itself is not useful here but it's representative of how we're going to start using this syntax not only in this week's problem sets but next week as well. All right. So, with that claim made that we can do a lot of damage, let's consider how pointers and knowledge of memory addresses can actually solve some useful problems. Um, can we get one volunteer to come on up and help pour a drink? Come on up. All right. What is your name? Come on over. >> If you want to say a quick hello to the group. >> I'm Olivia. >> Okay. and and a little something about yourself. >> Oh, um I live in Canada. >> Okay, welcome. Well, come on over here, Olivia. And we have um two glasses. Well, really three glasses. So, we have these fancy ray bands that have cameras built in whereby we can sort of capture your point of view. If you're comfortable, we'll put these on. There's no lenses in them. The white light will mean we're recording. Hopefully, a memorable moment. This battery too is dead. All right. We don't have a backup for the backup, so we're going to pretend that this part never happened. So, >> Olivia, we have two glasses here for you. And I'm going to go ahead and pour uh some colored liquid into both. So, we've got some blue liquid here into this glass. All right. So, we'll fill this up here. And then in this one, we're going to go ahead and pour this orange liquid. And at this point in the story, I'm going to exclaim, "Oh no, I accidentally put the wrong liquid in the wrong glass. So, I got this backwards." So, what I'd like you to do is swap the values in these glasses so that the blue goes into that glass and the the orange goes into this glass >> without mixing it or >> without mixing it. So, well, you're hesitating. Why? >> Well, it would be hard to do unless you can like talk to the mic if you could. >> Oh, it would be like hard to do um without mixing the two because like you don't have anywhere to put the other one, >> of course. So, in the real world, this is not really solvable unless for instance, we have a temporary variable if you will, like an empty glass in which to do this. So, here is your third variable if you want to go ahead now and get the blue into that one and the orange into that one. Yeah. No pressure. All right. So, we're putting one value into the temporary variable. We're putting the other value into the original value. Okay. And now you're probably going to take Yep. I'm guessing the temporary value put it into the original variable and that that was very well done. If maybe we can give Olivia a round of applause for just that. Thank you. We have [applause] little parting gift for you here too. So goal here really being to create a memorable moment of like oh remember the time Olivia tried to swap two values she needed a temporary variable is the takeaway. So why is that? one code. If we wanted to do the same principle, we're going to need somewhere temporary to put one of those values before we can make this happen. The catch is though that if we don't do this intelligently, like it's just not going to work in C unless we take advantage of some of these new capabilities. So, in fact, I'm going to go over to VS Code here and I'm going to open up a program called swap.c that I wrote in advance whose purpose in life is simply to swap two variables values. So, I've got standard io.h at the top so I can use printf. I've got the prototype for a swap function which is uh might as well be Olivia in this case that's going to take two inputs A and B or two uh glasses and swap their values ultimately is its purpose inside of main though I'm going to do this I'm going to set two variables X and Y equal to one and two respectively I'm then just as uh point of clarification going to print out the value of X is such and such y is such and such then I'm going to call the swap function aka Olivia to swap the values x and y then I'm going to print out x is this and why is this? So that hopefully I'll see that they've indeed been swapped. At the bottom of this file, we have the actual swap function. And as you might expect, it takes two inputs, A and B, both of which are integers. So I could have called them anything I want. The first thing this function does is it grabs an empty glass called temp, puts a or the blue liquid into it. Then we put into A the value of B. So we've sort of lost the value of A at this point except that we did make a copy of it into temp. And then lastly, we put into B the temporary variable. And at the end, the temp variable is empty. Although technically it still has a copy of the value, but it's no longer useful because the job is done. And A has become B and B has become A. So I dare say this is like the literal translation of what Olivia just did. And I I like the logic of it. However, when I actually run this program, something goes ary. So let me go ahead and do make swap dot slap. And I'll maximize my window. I should see hopefully that X is one, Y is two, and then X is two, and Y is one. But no, like even though I literally translated into code what Olivia did, this didn't actually seem to work. And why is that? Well, it turns out that this version of the program is not right. In fact, because of issues of scope. And we've talked about scope before, generally in the context of like where a variable lives. We've said that a variable only exists in like the most recent curly braces that you opened up for it. And that was true. It's just sort of a colloquial way of describing what scope is. But scope comes into play here because it turns out that A and B, in so far as they are the arguments or parameters for the swap function, they have a different scope than X and Y. And that still follows the same definition. They're inside of different curly braces than X and Y are. So it seems that I may very well be swapping A and B, but I'm not having any impact on X and Y. So why is that? Well, in C, all this time, anytime you pass in arguments to a function, you are passing in those arguments by value, so to speak. You're literally passing in copies of the variables to the function you are calling. So what does this mean? Well, more concretely, if like this is a p photograph of a chunk of memory inside of the computer and we sort of zoom in as we've done before and we abstract away all of the bytes from top to bottom, what's really happening inside of the computer's memory is that we're using some of it for X and Y and some other memory for A and B. But how is that in fact happening? Well, it turns out to a question that came up before the break, memory in a computer is actually assigned in a somewhat deliberate fashion. And generally if we think of this rectangle is representing my computer's whole chunk of memory. Generally what happens when you run a program with dot slash something or on a Mac or PC by double clicking or on a phone by single tapping. What happens is all of the zeros and ones that were compiled by the company or person who made that program are loaded into the top of the computer's memory so to speak. This is just an artist rendition. There's no notion of top or bottom per se, but it's loaded into this chunk of memory at the very edge of the computer's memory aka machine code. the zeros and ones that compose the actual program. That's where they go. So, they're copied from the hard drive or the SSD, whatever you know it as, the persistent storage, and it's put there in the computer's RAM or random access memory, which is the faster memory where programs and files live while you are using them. Meanwhile, if your program or the program you're using has any global variables, global in the sense that they're defined outside of main and not inside of main or inside of other functions, they end up right below that machine code by convention, just so they're accessible everywhere. Meanwhile, there's this big chunk of memory below that called the heap. The heap is the chunk of memory that Maloc uses to allocate memory for you. So the first time you call Maloc, it's going to give you probably this chunk of memory. The second time this chunk, the third time, this chunk, and this chunk, and so forth, back to back to back in memory, but Maloc is going to manage all of that for you. You don't have to worry about where it's coming from, but it's coming more generally from this big heap area. But it turns out that the way computers are designed is that the heap of course sort of grows and therefore downward again even though there's no notion of up down inside of the computer but it grows in this direction. But it'd be nice to make use of this other area of memory and that's what's called the stack. And the stack is the area of memory that's used anytime you create local variables or call functions. So again, maloc uses memory from up here and functions and variables use memory down here just because this is what humans in a room decided years ago is how the computer's memory would be used. Therefore, the stack grows sort of vertically much like stacking trays in a cafeteria or the dining hall. They go from bottom to top in this model. All right. Well, let's consider for the moment just how the stack is used because we're using a main function in this program. We're using a swap function in this program. So I claim that those functions are going to use memory down here. Well, how are they going to use it? And how is this in fact bad for our current goal? Well, when you call the main function, it uses this chunk of memory here. Specifically, if main has any arguments like command line arguments, or if main has any local variables, they end up down here in memory. Meanwhile, when Maine calls swap, swap gets the next available chunk of memory above it, so to speak, in memory, and any of its arguments or local variables end up there. So when main uh when swap is done executing it's as though that memory disappears even though the zeros and ones are still there but the computer can now reuse that same chunk of memory later. Airgo garbage values when functions are being called going up and down conceptually that's why you're getting remnants of previous values in the computer's memory. But let's focus on main for a moment in Maine in this program. Recall that I declared two variables X and Y. X getting the value one Y getting the value two per these two lines of code. Then I called the swap function. So swap is going to get its own chunk of memory, more technically called a frame of memory. And inside of that frame, it has two arguments, A and B, and a local variable called temp. So I'll draw them as such. When you actually call swap passing in X and Y, X and Y are passed in by value, that is to say copy. So A becomes a copy of X and B becomes a copy of Y. So when this line of code or rather this uh prototype for swap just makes clear that it takes two arguments a and b both of which are integers in that same order. So x comma y uh lines up with a comma b. So what happens then inside of the swap function if a is a copy of x and b is a copy of y. Well at the moment it's equal to one and two respectively. But consider this first line of code int temp gets a. So temp takes on the value of a. Next line of code, A gets B. So A gets the value of uh B. Sorry, which just happened. Meanwhile, B gets the value of temp. So B gets the value of temp. Now temp still has a copy of one. So it's not quite analogous to the liquid because we're that glass is clearly now empty, but it does contain remnants of what it once did. But the key here is that A and B have successfully been swapped. If I were to print out A and B, I would see that they've been swapped. But what has obviously not been swapped in this story? No one has touched X or Y because when swap returns, especially if I don't even print out anything in swap, X and Y are unchanged. So A and B, the copies were swapped but not the original values. And that's the essence of the problem here with this represent this simple uh example of swapping values because I was passing by value. But as of today, we now have a solution to this problem. Because previously today, if I asked you to write a function that swapped two values, you could not physically do it in code because you had no way of expressing the solution to this problem. But now we have the ability to pass by reference. That is use pointers and addresses more generally to tell the function how to go to an address and do something there. How to go to another address and do something there. How do I express this syntactically? It's going to look a little scary at first glance, but it's just an application of today's new building blocks. This bad version of the program where a and b are both integers just needs to change to be addresses of integers. So give the function a sort of treasure map that leads it to the actual x and y by saying that a is now not going to be an int per se but the address of an int. b is going to be the address of an int. And now to use those values, you can say the following. int temp gets whatever is at location A, go to location A and put whatever is at location B, go to location B and put in the temp value. And here is a perfect example of where this use and overuse of the star or asterisk operator is just like cognitively confusing frankly because we use star for multiplication. We use it for declaring a pointer. We use it for dreferencing a pointer. Ideally, humans years ago would have come up with another symbol on the US English keyboard to represent these different ideas. But this is where we're at. We're using the star for different things in different contexts. So, this just tells the computer that A is going to be a pointer, an address of an int. This tells the computer that B is going to be the address of an int. This star when there's no data type to the left of it means go to that address, as does every other example thereof. So, what's happening this time? If we actually look at the diagram again, X and Y are still one and two respectively. Swap gets called. It gets now the values of the address of X and the address of Y. So pictorially we might draw that as following. A is pointing to X. B is pointing to two. I mean technically it's like ox123 and ox12 whatever, but who cares? We're just going to abstract it away now with actual arrows or pointers. The beauty of this now then is if we look at the swap function, int temp gets star a that means start at a and go there sort of shoots in ladder style familiar with the game and you find the value one. So you put the value one inside of temp which is why it's there. Now meanwhile this next line of code go to A's address go to B's address and copy the ladder to the former. So this means go to A. This means go to B where you find the two. So put the two where A is pointing. Lastly, go to B and put temp there. So that's easy. Go to B and point temp, which is why we now have the one. And the beauty of this now is that when swap is done executing, this memory, this frame sort of goes away conceptually, even though the zeros and ones are still there, but it's done being used, but we have now mutated the actual values of X and Y by giving them a proverbial treasure map of the addresses of X and Y, not copies of the values themselves. So hopefully this is the beginning of an answer to like why is this stuff useful? You can now solve a whole new class of problem and even more next week. Other uh questions though on any of the syntax pictures or the like. This is good use of pointers now instead of bad. All right. So with that new capability, let us consider here how things can still go wrong and why indeed with this power comes that responsibility. Well, if you consider now the bad version of the code is fixable via this good version of the code, we've still left a big glaring problem in the diagram itself. Designing something that grows this way against something that grows this way, like this is not going to end well. Why? Because the more you call maloc, the more memory that gets used here. The more functions you call, the more memory that gets used here. And at some point, like they will collide because the computer only has a finite amount of memory. So how do you avoid this situation? Like you kind of don't like you honestly just make sure that you minimize how much memory you're using by calling maloc only as much as you need to and not calling for a million bytes of memory just because you might need them. You only allocate what memory you need. and you try not to call functions again and again and again and again and again and again without them finally returning. So if you ever did something recursive a a couple weeks ago where you accidentally maybe called a function that never had a base case never divided and conquered and actually shrunk the problem you could overflow the stack or equivalently heap by just using too many frames of memory. So it's just a mistake in the programmer uh for the program themselves. So if you've ever heard these phrases now, which some of you might have heap overflow or stack overflow, there's a very popular website called stack overflow. And this is the etmology thereof. Like stack overflow refers to this representative big problem with computers memories if you're not mindful of how you're using the computer's memory. And this is just the way it is. If you've got finite amount of anything, that resource can eventually run out at which point program will crash or something else might very well go wrong. In fact, this is a general more specific examples of what are called buffer overflows. A buffer overflow is generally just a chunk of memory like an array that actually just gets uh overflowed with too many values like using allocating a small array and trying to put too many numbers therein. There's problems that um and in fact you can see this very simply if we take off those last of our training wheels. So for instance these are the functions in the CS50 library get int get string and so forth. um they're harder to take off these training. It's harder to take off these training wheels because C does not fundamentally make it that easy to manage memory yourself. So for instance, let's focus for just a moment on get int. I'm going to go over to VS Code here in just a second and let's go ahead and create our very simple program called getc whose purpose in life is to just get an integer much like CS50's own function. So, in get C, I'm going to propose that we write a program that does a little something like this. Uh, include CS50.h, include standard io.h, and then inside of main, let's go ahead and declare an int n. Uh, set it equal to get int, and we'll just ask the user for the value of n. Then let's go ahead and print out n's value verbatim back by just doing quote unquote comma n. This program is simply using the get in function in order to get an int and stored in n. So let's run it. Make get slashget. Type in a number like 50. Seems to work. And yes, I think this program is correct even though it is using the CS50 training wheel of get int. Let's stop using get int though. It turns out that you don't have to use get int if you instead use a function called scanf which scans formatted input which just means read something from the keyboard into memory. This is essentially what get string and get in using although that too is a bit of an oversimplification but let's use it here now is an opportunity to get rid of the training wheel of the CS50 library al together and down here let's do this instead of using get int let's declare a variable n but not give it a value yet let's now print out just a little prompt just to tell the human what we want we want them to type in a value for n and now let's use this new function called scanf and say scan from the user's keyboard an integer represented by percent i, our old friend and format code. And please put the integer that the human types in in the variable n. This is slightly buggy though because if I want a function like scanf to be able to change the value of a variable, just like the swap function, I can't just pass in n. I need to pass in the address of n here. In fact, let's take a moment now to go into the swap function which we knew to be buggy before and actually update it to match what we saw on the slides. I claim that the problem is that we're passing in originally x and y as one and two into the swap function but therefore we're passing in copies. But what if we change the swap function to take indeed the address of an int and the address of an int. Let me change my prototype accordingly because that two must be changed. Then when I change this function to take in those pointers, I need to change my code to dreference them. But there's one last thing I need to do. I'm still on this line of swap passing in X and Y, which is literally the values X and Y. If I want to pass in the address of X and the address of Y, what other operator do I now need? the amperand x and the amperand y to pass in sort of the treasure map the pointer to those two variables locations. So if I open up my terminal window now do make swap on this version dot / swap cross my fingers now this new and improved version of swap as claimed does actually swap the values the key being swap now has access not to x and y per se but to the addresses of x and y. So if we now close out swap and go back to get, here is the same principle applied to scanf. If scanf exists and it comes with c, its purpose in life is to scan an integer from the keyboard and put it somewhere you want. You can't just give it the variable name because it's going to get a copy of whatever garbage value is in there. You have to say put this answer in the address at the address of n itself. So lastly after this, let me go ahead and print out n colon and then percent i again as a format code back slashn, n. This line is just my prompt because I just want the human to know what they're being asked for. This line is printing out n colon and then the actual value. So the only interesting part here is that I'm declaring a variable called n, but I'm not giving it a value myself, but I'm using scanf instead of get int to scan so to speak an integer from the keyboard and put it at the address of n. So that scanf has access to that value. So if I now do make get without any cs50 library/get, let's type in the number 50, I indeed see the number spit back at me. And just to be clear, print f uses these format codes of percent i and so forth. Scanf uses essentially the same format code. So that's why I'm using percent i in both places. Both functions per their documentation are designed to do just that. So this is great. We've gotten rid of get int. Catch is that getting rid of get string is much much harder. Why? Well, let's try another example. Let's go ahead and try to get a string from the user instead of just an int. So we'll call it string s. But wait a minute. CS50 library is not included. So we need to use the actual thing that this is. So char star s means give me a variable that's going to store a string. Let's go ahead and print out that prompt just to prompt the user for s just for clarity. Now let's use scanf and scan a string with percent s and put it at location s. Then let's go ahead and print out just a reminder that the value of s is now that passing in s. Now there's something a little bit bit different here. Notice that I've deliberately not used an amperand before this s why even though I did before the n. Yeah. >> Yeah. So I want to pass in the address of the string which is if I may like already s like s is by definition the address of some string that is what a char star is or rather it's the address of a character but we know already that if you lead it to the first character whatever function can find the end of it thanks to the null character except that that's not going to be wholly true here but I don't want to do amperand here because if s is an address doing amperand s would be the address of an address which is actually a thing called a pointer to a pointer but none of at today, but it's going to be correct as written here. N was an integer, so I needed the address of it. S is already a pointer by definition. It's a char star, so I don't use the amperand here. But the problem is this. If I now do makeget dot slashget, and let's type in a word like how about hi. Okay, it did work. Let me try something even bigger like hi. Let's just hold this down a lot. Uh, let's do how about this? A really long string. Oh, come on. Let's type in a really long string like hi. And it's always a gamble to see if I've done this long enough, but okay, it didn't break. Okay, you'd like to think that this is correct, but let's go ahead and do this. Valgrind of get uh slashget enter. Let me maximize my screen. Oh, uh, and let me go ahead and type in a value for S. While Valgren is running, I'm going to type in hi exclamation point. And now lot, uh, let's actually scroll down to the scroll up to the top of this. A lot of error seems to have happened here. Use of uninitialized value of size eight. Use of uninitialized value of size eight. Like a lot of stuff is going wrong here apparently on it looks like maybe line four, which is quite early in the program. And in fact, well, actually that's not it. Uh, line multiple lines of code here we're having issues with. But why? Well, let's focus on the code here alone for a moment. Line five is giving me what? A variable called S. That's the address of a char. But what is S right now? Like what value is in there? >> It's a garbage value because there's no equal sign involved. I'm just saying give me space. Like give me eight bytes, 64 bits to store the address of a character. But if I don't use the equal sign and actually put anything there, it is in fact just some garbage value. The print f is uninteresting. It's just printing out son. Scanf though is saying go to this address and store the characters that the human typed in. But that means like following the wiggly line that we drew on the screen before because we have no idea where S is pointing. It might be there, there, there, there. You're putting the string at a bogus location in memory. You haven't actually allocated memory. So when you then try to print it, you're just trusting that you're going to memory again that you control. So what is the solution here? Well, there's a few different ways we could solve this. We could do something like this. Actually allocate space for like four bytes so that the human can safely type in uh so the human can safely type in high exclamation point with room for the null character. We could change S to actually be an array of size four because we can treat arrays as though they're addresses and addresses as though they're arrays. It turns out that syntactic sugar really goes in both directions. This too would solve that problem. Or better still, we wouldn't use scanf at all because how do I know how many characters the human's going to type in? Like this was a question too that came up during break. Well, high will fit in four bytes with the null character. By will not. So maybe I need five. Well, what if they type in a longer word? Six. Well, maybe the longer words, seven. Well, maybe a hundred or maybe a thousand or 10,000 or 100,000 or a million. Like, at some point, you've got to draw a line in the sand and say you can't type in something longer than this. And you see this in applications all the time. Like on the web, you can only type in so many characters sometimes into forms. And that's for various reasons. Among them is this. Get string though will handle almost an infinite number of characters because the way we implemented get string is to take baby steps through the input. When you type in a word on the keyboard or even a paragraph on the keyboard, we get strings implementers call maloc essentially again and again and again and again just asking for one more bite if we need it, one more bite if we need it, one more bite so that you don't have to worry about doing that. The problem is if you were to write code yourself without the CS50 library or someone else's equivalent library, you have to decide like how many bytes do you want to allow and you have to trust that the human is not going to mess around and type in more values than you actually expect. So what's happening with all of these examples thus far is that if you think of your memory as kind of a minefield of garbage values wasn't a problem when we declared n to have a value of 50 because we told scanf to go to that address and put the number 50 there and it fits. That's fine because an int is always four bytes in this case. Who knows how many times the human is going to hit the keyboard when typing in a string. Could be three or four or a million or anything else. So when we declare S here to be a pointer, it takes up eight bytes per the Oscar the grouch Oscar is the grouch here whereby that's eight garbage values that collectively represent that address at the moment because we've not assigned it to any other value. So if we try to tell scanf go to this address and store high or anything else there like who knows where it's going to end up in memory hence the squiggly line again and the program will quite often crash. I didn't get it because I didn't type in long enough of a string, but it would eventually, if I tried hard enough, crash because you're touching memory that you yourself did not allocate as an array via maloc or some other mechanism. So, what is the solution? Honestly, like don't use C for user input like this unless you're prepared to implement that complexity yourself. Use the CS50 library or some other library. This too is why in two weeks we're going to switch to Python because Python makes life so much easier when it comes to basic things like getting user input as do many other modern languages. But those languages just have code that other humans have written to solve these problems for you. So these problems exist but they'll be abstracted away for you. All right, let's tie this now together with where we began, which was to convey ultimately that we want to have uh the ability now to actually access files. And we introduce now a topic called file IO. IO for input and output. A file is just a bunch of bytes that are stored on disk, where disk might mean a hard drive, the thing that spins around with a platter with lots of zeros and ones on it, or an SSD, a solid state drive, which is u no moving parts nowadays and generally where our data is stored long term. Whereas RAM, random access memory, the y, the yellow pictures we've been drawing, is volatile. That is to say, when you lose power, the battery dies, you lose everything in RAM. On a hard drive or a solid state drive, that's persistent storage or nonvolatile storage, which means when the power goes out, thankfully, you don't lose all of your documents and essays and so forth, whether it's on your Mac or PC or somewhere in the cloud. But we haven't yet seen any code via which you yourselves can create files. Like literally every program we've written, even the phone book example last time when I typed in names and numbers, they got deleted as soon as the program quit and ended. So with File IO though, we have the ability now to start creating, saving, editing, deleting files much like you would from the file menu of Google Docs, Microsoft Word, or the like. Here are just some of the functions that come with the programming language C that allow you to open files aka FOP, close files, aka Flo, print to a file, scan from a file, read a file, write to a file, lots of different functions, some of which we'll explore this coming week. But why don't we first use them to solve a problem here in VS Code. So, let me go ahead and close get.c. Let's go ahead and open up a new program called phonebook.c, C, but implement a persistent version of it ultimately that doesn't just get deleted from memory when the program quits. Let's go ahead and only because it will make life easier, let's include the CS50 library still for this. Let's include standard io.h for this. And let's include string.h for this. Then inside of main, no command line arguments, let's go ahead and open a file called phonebook.csv. CSV stands for commaepparated values. Many of you have probably used them in the real world. They're like very lightweight spreadsheets where things are effectively stored in rows and columns where the columns are represented by just commas between values. And we'll see this in just a moment. How do you open a new file called phonebook.csv? Well, I'm going to do file star file equals fop phone.csv. And then I'm going to do quote unquote w for write. So what's going on here? fop is opening a file whether or not it exists yet called phonebook.csv and it's opening it in such a way that I will be allowed to write to it. Hence the quote unquote w per the documentation it means I can write to this file and not just read it. The return value is going to be stored in a variable called file. All lowercase by convention but that file is technically a strct called file in all caps. It's a little weird. It's among the few things that is fully capitalized in C. It doesn't mean it's a constant or anything like that. It's just how someone implemented it years ago. This is giving me a pointer to essentially the contents of that file. That's a bit of a white lie. Technically giving you a pointer to a chunk of memory that represents that file, but for all intents and purposes, it's a pointer to the file for now. Now, let's go ahead and ask the user for a name and number to add to this phone book. Let's do charar name equals get string uh quote unquote name to prompt the human for that. Charar number. Let's prompt them for that. and do it with this. And I could be using the string data type, but I'm trying to at least remove what training wheels we don't technically need anymore. And now that we've got a name and number in variables, let's print them to the file. That is, let's save them to the file. Instead of print f, we're going to use frrint f, we're going to specify what file we want to print to in case we have multiple ones open. What do I want to print? A string followed by a string followed by a new line. ergo comma separated values one after the other per line. Then I'm gonna pass in the values name and number respectively. And now I'm going to go ahead and do f close to close that file so that it's effectively saved. All right. So let me go ahead and demonstrate first that phone book.csv does not really exist. It's empty initially. Let me go ahead and scooch it over to the right here so we can see both at the same time. I'm now going to do make phone book. Enter. So far so good. Dot slashphonebook and let me go ahead and type in for instance uh let's see uh my name 617495 1000 and watch the top right of your screen as the program f writes to it and f closes the contents. All good. All right, let's run it again because maybe like the iOS app or the Android app, I'm adding new friends to my phone book here. So, I'm going to do dot /phonebook and I'm going to go ahead and uhoh, top right just got turned blank. Well, let's try this. Kelly 6174951,000. Enter. Okay, she's back. Let me run it again. Dot phone book gone. Well, what's going on here? It's not persisting at least as long as I would like. It seems to be the case that like writing to a file means literally rewrite the file. So if you use W, you're going to write to the file, but literally starting at the first bite. If you want to be smart about it and append to the file, well, per the documentation for FOP, you instead use quote unquote A for append instead of quote unquote W for write. This is a convention in other languages, too. All right, let's start this over. Let me go ahead and recompile this program. Make phone book. Now, let me do /phonebook. I'll type in my name again first. 6174951000. Enter. So far so good. Phonebook. So far so good. Kelly 6174951000. Enter. And now we're on our way. In fact, I can close this file. I can close this file. I can then open up phonebook.csv. And indeed, it has persisted. And in fact, if I downloaded this file onto my Mac or my PC, I could then rightclick it or double click on it and probably open it in Microsoft Excel or Apple Numbers. I could import it into Google Sheets or any number of other spreadsheet tools because now I am persisting and writing files of my own. questions on any of the techniques we just tried out here. If we really want to be nitpicky, like technically I should fix one bug or missed opportunity if I open up phonebook.c, I'm going to propose that as with any use of pointers and addresses more generally. Here too, something could be wrong like maybe I'm just out of space and so fop can't physically open the file for me. So here too, I should check if file equals equals null. Okay, fine. return one and then maybe at the very bottom here I return zero to make clear nope nope if I get this far all is well. So in short anytime you are dealing now with pointers you should be checking the return values to see if all in fact went well. Yeah >> yes everything we are using is part of standard io.h H which is wonderfully useful now because it has not just print f but frint f and so forth as well. Good questions. Yeah. >> Yes. So we have how are pointers used in this code? The short answer is you have to use pointers because this is how C designed files to work. So, we couldn't really introduce you all to files, file IO in week one or two or three because we had it. We'd have to introduce like this stupid little character to you and you'd be like, "What does this mean? It's not multiplication." Because the way file IO works is that when you open a file, you are essentially handed the address of that file in memory. That's an oversimplification. You're technically handed the address of a data structure in memory that references the file actually on disk. But for all intents and purposes, as I said, this gives you a pointer to the contents of the file. And if you want to write to the file, you need to then do use frint f in this case, tell it what file to write to. So you can go there and then store something like this string with these values plugged in. So in short, in C without pointers, you just can't do file IO unless it's abstracted away for you by some library. Good question. Other questions on file IO? All right. Well, let me do one other example here that's a little reminiscent of things we see all the time on our phones and laptops and desktops, like these progress bars for like video players. And you're all probably generally familiar with the term like buffering. If only because YouTube and other apps when they are slow or you have a slow internet connection, they might say buffering dot dot dot. Well, what does that mean? Well, a buffer is just a chunk of memory. More specifically, it's often an array that is only a finite size that stores bytes of stuff. Well, in the context of a video player, for instance, this red line here, which represents you're that way through that much through the video, it's an array that stores like the next few bytes of a video. And ideally, if you have a fast enough connection, when you hit play, those bytes keep getting downloaded and added to the buffer. And hopefully, you don't finish watching the bytes that have been downloaded before more bytes have been downloaded. So, a buffer is just a chunk of memory or more specifically an array in a language like C. Well, just to demonstrate how else you can do things with file IO, let me propose that we write a simple little program that is our own implementation of the CP program, the copy program that we've used a few times already that allows you in your terminal window to copy one file to another, likening it to this idea of a progress bar, where bite by bite, you want to do something, namely in this case, copy it, not watch it instead. So, let me go in VS Code and code up a program called CP.C. And in in this program, I'm going to go ahead and include standard io.h at the top. I'm going to then give myself a main function that this time does take finally a command line argument via int arg c and our old friend string uh arg v which today we can now reveal to be also just a char star. In fact, this is how we could now technically write the declaration for main because string no longer exists without the CS50 library per se. So that's really what's been going on this whole time. Now, let me go ahead and do this. I want to be able to write a program that takes two command line arguments actually. The name of the file to copy and the name of the new file to create from it. So let's go ahead and create a file using the same syntax as before called src for short, source as is a convention. And let's open a file using uh the file name argv bracket one. So the first word the human types and let's go ahead and open it in read mode because I want to read the source and write to the destination. My next file file star dst destination for short will be fopen of argv 2, quote unquote write. Now why one and two and not zero and one in zero is the name of the program which is not interesting. One and two will contain the next two words that the human types. Now let me propose that I want to copy this file from source to destination bite by bite similar in spirit to a buffer like this where you're just grabbing from the internet one bite of the video at a time so as to watch it. In this case I want to copy it. So how can I do this? Well we don't have a data type per se for representing a bite eight bits. However, a common convention is to actually use our new friend type defaf and simply declare bite to be something significant or something specific. So, let me declare a type uh called bte. And what is a bite going to be? Well, it ideally is just a char because a char we know is one bite or eight bits. But recall that chars can be treated as integers and integers of course can be positive and negative. So even though this is a little esoteric, technically I want to define a bite to be what we'll call an unsigned char, which is probably a keyword you haven't yet seen. But it just tells the compiler that this char that is this sequence of eight bits cannot be interpreted as a negative number because I am not doing anything with math. These are just raw bytes or eight bits. So now down here I can give myself a bite and I'll call it B for short. And now I'm going to write a loop similar in spirit to what YouTube and other players are probably doing which just iterates over a file bite by bite making in our case a copy thereof. So while I am reading from this file into this bite the size of one bite one at a time into this destination. Go ahead and check that I've read at least one. So while the return value of a new function called fad is not equal to zero go ahead and oops sorry source go ahead and call fright another new function going to that address of the bite grabbing the size of it which happens to be one but I'll use size of for consistency grab one such bite and write it to destination this is a huge mouthful admittedly the last thing of which I need to do is close the destination so as to save it close the original file the source. Um, but this huge mouthful which you'll get more familiar with the next problem set is essentially saying on line 12 while I can read one bite at a time, write on line 14 that bite to the file. Implementing essentially this idea of the red progress bar going bite to bite to bite reading one bite at a time reading from one file the source writing to the other the destination. And here too to your question earlier like why why pointers? This is the way file IO is done. You have to be able to express go to this address, go to this file if you want to get data from it or to it. And a minor refinement too, technically when you open in files, if you know they're binary files, that is zeros and ones and not asy or unicode text files, you can technically tell fop write and read in binary mode. So there's no mistaking the bits for something other than raw data, an image or otherwise. All right. So, if I go ahead now and do make cp, it so far compiles. Let's try this out. So, here again is phonebook.csv. Whoops. Here, that's phonebook.c. Here again is phonebook.csv with two of us, David and Kelly. Let's try to make a copy of this file as follows. CP. So, this is my version of the copy program, not the one that comes with the system. Let's copy phonebook.csv into copy.csv. Enter. Let's open now the copy of the CSV. Enter. And voila. Thank god like it actually worked. I have made a bite forbyte copy of this file using syntax that was not available to us until today. So who cares? And what's the motivation? Well, it's a lot more fun to treat not just text files and these tiny little examples, but to actually play with real world examples. And in the next problem set, among the things you'll do is experiment with BMP files, bitmapped files, which essentially just means a grid of pixels top to bottom, left to right, much like our cat uh that our volunteers at classes start created for us. With a bit mapap file, you'll store in files literal uh sequences of pixels or dots, each of which is going to be represented with a specific color, a red value, a green value, and a blue value. And among the things you'll be able to do given such beautiful photos as this is as the weeks bridge down by the Charles River is actually make your own Instagram-l like filters to apply to photos like this understanding now as you do or soon will understand to be able to iterate over the file top to bottom left to right over each of the bytes therein and somehow mutate the bites to look a little bit different. So if this is the original photo, you might be able to make it all grayscale by changing the Rs, the G's and the B's to smaller values somehow that are simpler values that are just black and white and gray tones. You might take that same photo as input and give it more of a sepia tone like an old school photograph instead. You might actually reflect it like actually put these bytes over here and these bites over here so as to create the inverse of the image by reflecting it over the the vertical axis here. Or you might even blur the image like this. This is kind of a common feature in a lot of photo editing programs to either blur or deblur. Well, you can sort of do a little bit of math and make every pixel a little fuzzier by kind of clouding what the human is actually seeing. Or feeling more comfortable, you can actually write code now that you know how to manipulate files and addresses thereof and actually do edge detection and find the salient characteristics of something like the bridge to distinguish it from the sky and actually find filter-like edges like these. So, those are just some of the problems that you're going to solve over the coming week's problem set and manipulating ultimately files like these as well as JPEGs. And the last thing we thought we'd end on is a sort of computer science joke which for better or for worse, you're now getting more and more able to interpret. So, I'll leave you dramatically with this here famous joke. Oh, that's more laughter than usual. All right, that's it for week four. We will see you next time. Heat. Heat. >> [music] [music] [music] [music] >> All right, this is CS50 and this is week five already uh wherein we will focus today on data structures which is a topic we've touched on a little bit in simp in simple form but today we'll dive all the more deeply and for better or for worse this is our last week on C uh next week of course we transition to Python which is a so-called higher level programming language which is really frankly just going to make our lives a lot easier we're going to be able to solve a lot of the same problems but so much more quickly as humans but not necessarily as we'll see as fast when we run the code as the computer might have if we were still using a lower level language like C. So indeed thematic over this weekend next is going to be the theme we've seen before of tradeoffs. But before we get there, why don't we focus on a couple of data structures that you might encounter in the real world. Uh namely stacks and cues. Let's learn some facts about both of these. If we could dim the lights dramatically. Once upon a time, there was a guy named [music] Jack. When it came to making friends, Jack did not have the knack. So, Jack went to talk to the most [music] popular guy he knew. He went up to Lou and asked, "What do I do?" Lou saw that his friend was really distressed. "Well," Lou began, "Just look how you're dressed. Don't you have any clothes with a different look?" "Yes," said Jack. "I sure do. Come to my house and I'll show them to you." So they went off to Jack's and Jack showed Lou the box where he kept all his shirts and his pants and his socks. Lou said, "I see you have all your clothes in a pile. Why don't you wear some others once in a while?" Jack said, "Well, when I remove clothes and socks, I wash them and put them away in the box. Then comes the next morning and up I hop. I go to the box and get my clothes off the top." Lou quickly realized the problem with Jack. He kept clothes, CDs, and books in a stack. When he reached for something to [music] read or to wear, he chose the top book or underwear. Then when he was done, he would put it right back. Back it would go on top of the stack. I know the solution, said a triumphant Lou. You need to learn to start using a queue. Lou took Jack's clothes [music] and hung them in a closet. And when he had emptied the box, he just tossed it. Then he said, "Now Jack, at the end of the day, put your clothes in the left when you put them away. Then tomorrow morning when you see the sunshine, get your clothes from the right, from the end of the line. Don't you see? said Lou. It will be so nice. You'll wear everything once before you wear something twice. And with everything in cues in his closet and shelf, Jack started to feel quite sure of himself. All thanks to Lou and his wonderful [music] queue. All right. Our thanks to Professor Shannon Deval at Elon University who kindly put together that animation. And it's meant to paint a picture of a couple of things that we've all encountered in the real world. But more technically, what we just saw were what are known as abstract data types whereby they're data structures in some sense, but it's really about the design thereof. What characteristics or features or functionality these structures offer irrespective of how they are implemented in terms of lower level implementation details, which is to say you can implement, as we'll see, cues and stacks in any number of ways, which are going to have real world implications for how you can actually use them and what kinds of problems you can solve with them. So let's consider for instance Q's in the first place. So a Q is something you sort of experience all the time. Anytime you go to a store uh go to uh some event in for which you have to line up in a so-called queue. You'd ideally like there to be some fairness property about that queue such that if you got in line first you get into the store first. You get to check out first or some other such goal. Meanwhile, the person who got there last actually is at the end of the line and stays at the end of the line and therefore gets served or enters in at the end. So Q's have what a computer scientist would say is a FIFO property. First in first out. That is if you're the first person in line, you're the first person to get out of line. And for many problems, that is a good solution. Certainly if you're concerned with fairness. Um but more technically, AQ has what we'll call two operations. NQ, which is a fancy way of saying getting in line, and DQ, a fancy way of saying getting out of the line from the front of it. But those two operations, if you think about it in code, could it be implemented with different actual details? And by that I mean this here is one way that we could go about implementing in CC code a que for a bunch of people or persons who want to line up for something. So for instance we'll decree that this queue can hold no more than 50 people like that's the physical capacity and then we define a structure which we've done a couple of times in the past whereby this structure has not only an array of persons that we'll call people and that will be as big as is the capacity. So this is an array of size 50 for 50 such persons. And then we're going to propose that we also keep track in this implementation of a queue of the current size of the queue. So we're going to make a distinction between the capacity like how many total people can be there and the size like actually how many people are in line at that moment in time so that you know which of the spots in the array are effectively empty. And we're going to call that whole structure a Q. Now the catch with this particular implementation in code of a Q is what there is inherent in it a a limitation something you just kind of have to deal with and I see you nodding what what's your instinct for this >> for example 50 students >> okay well I think you hit the nail on the head in that it's only for 50 students or 50 people which means if a 50irst person wants to get into line you literally have no means of remembering them in this data structure so how do you solve that well we could just recompile our code after changing the 50 to like 51 or maybe 500 or 5,000. But there there's this trade-off because you could still be undershooting the total number of people trying to get into maybe a big concert in the case of an extreme. But at at the same time, if you overallocate memory using 5,000 locations in memory, what if only a few people show up? Now you're just wasting memory. And certainly at the end of the day, you only have a finite amount of memory in the computer. So you kind of have to decide a priority like before compiling your code, how big is this structure going to be? how much space are you going to waste? And in the end, it's all sort of stupid. It would be ideal if instead we could just grow the queue as needed and shrink it. Essentially asking the operating system, as we started doing last week, for more memory and then giving it back if we don't actually need that memory, which is to say can't really do an array in this static sense. And by static, I mean we're literally deciding in advance at compilation time how big this thing is going to be. As an aside, this is also a bit annoying for implementing a queue because you have to somehow keep track of who is at the head of the queue, the front of the queue, because as you start plucking people off, you need to remember who's the next person effectively. But there are ways in code that we could solve this. So let's consider an alternative to a queue which gives us very different properties, namely a stack. And we saw that in the animation whereby uh Jack used a stack to put his clothes into a box so that every time he got dressed he sort of took the sweater from the top from the top from the top and might never wear anything other than black as a result. If he does a wash before he actually reaches the blue and the red sweater there. So a stack as we've just seen has a LIFO property to it. Last in first out. So, if I do a load of laundry and I plop some more sweaters on this stack, well, I'm presumably going to use the last sweater that went in first as opposed to trying to create a mess and like, you know, pull the bottommost sweater out, which is just going to be a little more effort than uh than it would be otherwise from just taking it from the top. So, sometimes last and first out doesn't give you maybe this fairness property you might want for other problems, but it does give you an efficiency, a convenience certainly. So, maybe that might be compelling. And stacks are actually everywhere, too. If you've checked your Gmail recently, odds are you've opened up gmail.com or outlook.com and you've looked at your inbox. And where does the new mail by default end up? At the top. At the top. At the top. And I dare say all of us are guilty of sort of neglecting emails that fall below the break or onto the next page and sort of focusing only on the last in and therefore replying to it first out, which isn't great maybe for the senders of those emails, but it's just how those user interfaces are implemented quite often unless you override those default settings. So how might we implement a stack? Well, we need to implement more technically two fundamental operations. The analoges of NQ and DQ in the world of stacks are called push, which means push something onto the top of the stack, and pop, which means remove something from the top of the stack also. And the the team in the cafeterias and dining halls on campus do this all day long. Any of the cafeterias or dining halls that have stacks of trays, of course, you put the first tray at the bottom and then the next tray and the next tray and the next tray. And which tray do all of you pick up? Well, presumably the one on the very top because it's even harder to grab the bottommost tray than it would be for something like a sweater. As a result, there's maybe undesirable properties like maybe no one ever gets to the nasty tray at the very bottom of the stack because we're constantly replenishing the top ones. But thanks to gravity, like that just happens to be the most appropriate data structure in the real world for distributing things like trays in a cafeteria. So, how might we implement that idea in code? Well, funny enough, we can pretty much use the exact same structure. We could just rename Q to stack because at the end of the day we need to keep track of some number of people and maybe people's is a weird sort of analog here but we kept everything else the same so why not that but the size is also something we still need to remember and it turns out it's a little easier to implement a stack in this way because you could always remove it from the end of the array end of the array and the first thing that went into the stack the first in can always stay at location zero for instance but ultimately we could implement it in this way but we have the same darn limitation You can still only put 50 sweaters, 50 trays, 50 people into that stack data structure. So this is just one implementation approach. But that doesn't mean that's necessarily a limitation of stacks and cues. They're abstract in the sense that we could do better. We could maybe start to manage our own memory, move away from statically defining the total size of this array and just start allocating and deallocating, that is growing and shrinking the data structure instead. which is to say we can make these abstract data types much less abstract with actual implementations. Let's consider a data structure that we saw an abstract data type that we saw early on that we didn't necessarily give this name. A dictionary is yet another abstract data type that's sort of everywhere in the world literally in the world of dictionaries containing words and their definitions. And you can think of a dictionary really in the abstract if you were to draw this on the chalkboard as really just a two column table whereby on the left is the word and on the right is the definition. And if it's a physical book, it's essentially the same thing with lots of columns of words on the left, often bold-faced, and then the definitions right next to them. You can also see this in the context of like a phone book, which is where we began the course in week zero, where it's essentially a dictionary of names and numbers instead of words and definitions. And a computer scientist would generalize the notion of a dictionary further and just call the thing on the left a key and the thing on the right a value. And these things are omniresent in computing. And you're going to start to see them all the more today. next week and beyond in that if you just want to associate some piece of data with another piece of data, a so-called key value pair, a dictionary is going to be your go-to data type. But even these two we can implement in different ways for reasons that we've already seen. Like maybe there's only a finite size to this dictionary if we're using an array. Maybe we can do better than that. And maybe a dictionary if implemented one way is going to be fast. Maybe if implemented another way is going to be slow. So we'll consider these other design possibilities today too in the context of phone books and other data structures as well. After all, if you have an iPhone or an Android phone and Apple or Google only decided that you can have 50 friends because they implemented the contacts app in an array. I mean that would be an annoying limitation. So presumably they've done things a little more dynamically as we'll do today. So let's focus on the first of the data structures we saw back in week 2. That is an array which recall was just a chunk of memory where you can store values in it back to back to back and that was the fundamental definition. The values are back to back to back or contiguous in memory and as we've seen we generally have to decide in advance the size of an array. So for instance if we want to store three values like 1 2 and three it might look pictorially like this or in code let's go ahead and implement this same idea and take a moment to whip up our very first program here and we'll call it say list C. And in this program, let's just do something demonstrative of how you could use arrays to store three things in memory. It's quite simply the numbers 1 2 3, but you can imagine it being three people's names, three sweaters, three people, or any other piece of data as well. So, I'm going to go ahead and at the top of list C include standard io.h. I'm going to then do int main void. So, no command line arguments. Then, I'm going to go ahead and give myself an array of integers of size three called list. And that's how we've done that uh from week two onward. Then just for the sake of discussion, I'm going to hardcode some representative values. So the first value will be at location zero because arrays are zero indexed. Then I'm going to do the second value which will be two. And then the third value which will be at location two, but the value will be three. Now just to prove that we've stored this correctly in memory, let's just do a quick for loop for int i equals uh equals z. Uh i is less than 3 i ++. And then inside of this for loop, I'm just going to do a quick print f of percent i back slashn printing out the value of list at location i. So it's not a useful program per se, but it gives us an array to play with. It prints out that what's in it. So hopefully we will see one, two, and three on the screen. So let me make this list program dot /list enter. And voila, we're on our way going. All right. But what if now we actually want to uh change that design and be like, "Oh, shoot. I now have a fourth number that I want to store or just bought a fourth sweater or a fourth person wants to get in line or I want to add a fourth friend to my contacts. Whatever the scenario might be, it stands to reason that ideally you would plop that fourth value right here in memory so that everything remains contiguous. You're still using an array. Your code doesn't really have to change except for the length. All for for all intents and purposes, it's the same implementation using a just a bit more memory. But recall that when you declare an array of a fixed size, you only are getting promised that chunk of memory, not necessarily more memory to the right, to the left, above or below conceptually because recall in the context of your whole computer, you've got this canvas of memory, all of which represent here bytes. And there could be a whole bunch of actual values or garbage values in memory. So in a more complicated program, that 1 2 3 sure might end up here. But if I also had created a string in this program, h e l o comma world might have also ended up right next to it in memory. Which means I can't just plop the four here because then if I'm still using that string elsewhere in my program now it's going to say hello world instead of hello world because you're just claiming the h that bite as your own which does not in fact belong to your array. Of course there looks like there's plenty of other memory I could use here because these garbage values represented by Oscar are not being used. They've been used in the past, but we treat garbage values as memory we could reuse. Certainly. So, wouldn't it be nice to maybe just plop the 1 2 3 and four in this chunk of memory over here? And I can totally do that. But, of course, if I want to do that, I got to copy the first three values over and then put the fourth one there and then presumably give back to the operating system the memory I no longer need. So, that in fact when using arrays is a perfectly valid solution. And I think we can go ahead and do this in our same program. So let me go back to VS Code here. And instead of statically allocating memory for this array and by static I mean literally hard hard- coding the number three here in a way that is permanent uh effectively. Let me go ahead and do this instead. At the top of my code, let me delete the static allocation of that in uh that array before. And now let me leverage my understanding if still preliminary of pointers and memory management from this past week four to just dynamically allocate a guess at how much memory I need initially. So I'm going to go ahead and use maloc and allocate space for three integers but integers take up a few bytes and it's usually is four but just for good measure I'm going to say times whatever the size of an int is is the total number of bytes I want. So presumably it's going to be 3 * 4 equals 12. But I'm generalizing it. But then recall that maloc returns the address of that chunk of memory, the address of the first bite. So if I want to create an array effectively called list, I can't just do int list like this yet. But what I could say is that all right now my list variable is actually going to be the address of an integer and set maloc's return value equal to that. So in code here what I've done is I'm asking on the right hand side the operating system please give me 12 contiguous bytes in memory. All of those bytes of course can be numerically addressed like ox123425. We've had that story before. Maloclock by definition returns the address of the first such byte and it's on me to remember that I allocated 12 if need be. So I'm just storing the address of that first bite in a pointer called list. But recall from last week, there's this functional equivalence we saw between treating a pointer as an array and sometimes even treating an array like a pointer. The C uh language sort of lets us do this this conversion if you will. So what I could do here now is quite the same syntax as before. I could say list bracket 0 gets one, list bracket one gets two, list bracket two gets three. And even though I have this fancy new line inspired by week four, the syntax thereafter can be exactly the same. Why? Well, recall that these three lines here using square bracket notation is just syntactic sugar for the stuff we learned last week. Specifically, I could instead of doing list bracket zero, I could much more arcanely say go to that address in list and put the number one there, please. I can say go to the address list + one and put the value two there. I could then say finally go to the address at list + two and put the number three there. But this looks ridiculous and even u sort of an experienced programmer might not be inclined to do this. If with using fewer keystrokes and more readable code, they could just do instead what I did the first time around, which is functionally the same, and just treat that chunk of memory as though it's an array. and the computer will essentially do the requisite pointer arithmetic to figure out where to put one, two, and three. So even though this is still kind of fresh, hot off the press from last week, it's exactly the same as we tinkered with last week. So suppose now that some time passes and I realize for the sake of the story that oh shoot, I need more than three integers. I need space for four so as to achieve this picture in memory. Well, I could of course just like delete all that code, change the three to a four, redo the whole thing, recompile the code, rerun it. But let me propose that we write our code in a way that allows us to change our mind while the program is running how much memory we actually need. And case in point, if you meet someone new, you want to add them to your phone. Well, you obviously don't want to have to wait for Apple to recompile the contacts app, reboot your phone just to add one more person. You want the program just to ask the operating system for more memory for that new person. So in this case, let's just pretend that some time passes and now I want to go ahead and actually change my mind and instead allocate space for four integers instead. Well, I could do something like this. I could just say literally list equals maloc of 4* size of int semicolon. I don't need to redeclare list on line 13 because it already exists from line five. But this is bad because what have I done wrong here in line 13? I've made a poor decision. Yeah, in front. >> You like waste all the memory that >> Yeah, I'm wasting all of the memory I had from line five because I'm essentially forgetting where it is. If the list pointer is literally a pointer, like a foam finger pointing somewhere in memory, what I'm really doing is saying point it over here now, but I've completely lost track of those other three integers in memory. And that's what we described last week as a memory leak, which you could find with valgrren. And if you didn't find it or fix it in code, eventually the computer and the program would slow down over time. So this is probably bad. It's not good to just unilaterally change your mind and say, "No, no, no, forget about that memory. Give me a new chunk of memory." especially if you want to copy the old memory into the new, just like I did a bit ago when trying to get the 1 2 3 into the bigger chunk of memory that can fit 1 2 3 4. So, how might I do this? Well, a temporary variable is kind of our go-to solution anytime we need to remember something in addition to uh something we already have in mind. So, let me just give myself a temporary variable called tmp by convention for short and set the return value of this mala call to that. And then what I could do is something like this. Much like my print statement earlier, I could do another for loop and say for int i equals 0, i is less than 3, i ++. And then in this for loop, I could say treat that new chunk of memory as an array like we can set the i location equal to the i location in list. So these lines here copy old list into new list. It copies those first three values. And then what I bet I could do at the bottom here is then just manually I can say go to the fourth location which when you zero index is technically bracket three and set that equal to the number four. So these lines here copy the one, the two, and the three using a loop. And then line 20 here at the moment just adds the fourth value. And again, this is a stupid sort of way to write code in that if you want to put the four there, you should have just done it earlier. I'm just pretending that some time has indeed passed in the program. and I've changed my mind along the way and I want to let the user add some value to memory. Okay, but before we proceed further, I dare say that there are some other mistakes we should clean up. One of the lessons I preached last week was that anytime you use Maloc, what should you do or check for is you should always what? You should always free. So here I'm clearly not freeing any memory. So I should definitely do that. And there was one other rule of thumb with memory. What should you always do when using Malik? Yeah. >> Check to see if null came back, which just means something is wrong, like it's out of memory or something else went wrong. And if you don't do that, your program may very well crash with one of those segmentation faults that we saw uh briefly in the past. So, it makes the code a lot more bloated, but it is good practice. So, let's just check if the list pointer I get back contains null. There's no point continuing on. Let's just go ahead and immediately return one because something has indeed gone wrong. And then down here under maloc again, let's do the same. If the temporary pointer also contains null, now let's go ahead and similarly return one or any other nonzero value. But here's a subtlety and let me combine your two ideas. If I immediately return one on line 20 after the second maloc call fails, what should I still go back and do first? Yeah. Yeah. You want to elaborate on your first instinct? >> Yeah. I want to still free the first chunk of memory because if we execute line five and all is well, which means that line 6, 7, 8, and 9 don't apply. Like it's not in fact null. We got back a legitimate value. That means we have a chunk of memory given to us for three integers, which means it still exists down here at line 19 and 20. So if I'm ready now to abort this program and return one to signify error, I first want to free that original list and say to the operating system, here's your memory back. Now, as an aside, strictly speaking, this is not necessary because the moment the program itself quits, the computer is just going to give back the memory to the operating system. So when programs quit, the memory leaks sort of go away, but your code is still buggy. And generally we're running software that doesn't run for a split second but for minutes, hours, days, uh continually in which case it's best practice to squash these memory related bugs now. Check for null, free any memory so that you never indeed encounter these kinds of leaks. All right, so let's forge ahead a little bit more and let me propose that after we have done the copy, we now want to similarly free the original list. However, what I think we're going to want to do first is after freeing the original list is remember that the new list is effectively that which we allocated the second time around. So even though this program is getting a little long, notice that what I've just done is I've said, okay, store in the list variable the address of this new chunk of memory. So that list now with a foam finger is effectively pointing here instead of up here. But before that, I made sure to free what my finger was pointing at originally, the list pointer. All right. Lastly, let's just scroll down to the bottom of the code here. I can manually change the three to a four just to demonstrate that I've stored all four values in here. And then at the very end of the program, I think I have to free the list again because now list is pointing all the foam finger to the bigger chunk of memory, the 1 2 3 4. And then I can go ahead and return zero at the very end because all is hopefully well at this point. Let me go ahead and open my terminal window again and make this version of list. I made a lot of mistakes here it seems. Let's scroll up to the very first call to undeclared library function maloc dot dot dot. What have I apparently done wrong or forgotten? What have I done wrong? Yeah. In back. Yep. Yeah. So in standard lib.h H is where maloc is actually declared. So let's just add that quickly. Let's go ahead and include standard lib.h in addition to standard io.h. Let me clear my terminal window. Rerun make list. Enter. Now we're good. Dot /list. And ph we see 1 2 3 4. Okay. So at this point in the story, all we've done is write a dopey little program that allocates memory for three integers. 1 2 and three. then changes our mind and allocates more memory for four integers, freeing the original chunk of memory after copying the first three integers into the new memory and adding that fourth value. But this is kind of a lot of hoops to jump through. And let me propose one refinement here. So if back in VS Code, we go back into list.c here. It turns out that at least this loop isn't strictly necessary, not to mention the fact that we already have another loop for just printing the list. If I want to more cleverly reallocate memory, it turns out that there's another function that we didn't talk about last week, but is in standard lib.h2 called realloclock, which as the name kind of suggests, it reallocates memory, but a little smarter in that it will try to grow your existing chunk of memory if it can, which is going to be super efficient because then you can just plop the four at the very end. or if there just isn't room there because maybe someone else put hello world right there in memory elsewhere in your program. It's going to do all of the copying for you. So what you get back ultimately is a pointer to the new chunk of memory containing all of the original data as well. However, we're still going to have to check for null. We're still going to want to free the original list if something goes wrong and then return one. We're still going to want to add the fourth value because realo has no idea what more we want to put in the list. But I can in fact delete my other for loop whose purpose in life was just to copy all of those integers from old into new. All right, that was a lot. Let me pause for any questions. >> How does real know that it should reallocate the memory in list? Should you tell like if you have a lot of before, how does it specifically? >> Very good question. That's because I wrote a bug uh that we didn't trip over because I didn't compile this version of the code. So the question is how does realloc know what to realloclock? Well, according to the documentation which I forgot to read, you need to tell realloclock what the address is of the chunk of memory that you do want to realloc. So the first argument to realloc, which I did admittedly forget until a moment ago, is to put the address of the chunk of memory that you already maloced earlier so that it knows to go there, see if there's indeed some garbage values it can reclaim at the end of that chunk of memory or if it has to wholesale move things elsewhere in memory to give you four times the size of the int this time instead of just three. But still things can go wrong like you still want to check for this null value because real might not be able to give you enough memory or your memory could just be so fragmented that even though you want four bytes maybe there's three bytes over here two bytes over here one bite over here if there aren't four contiguous bytes realloclock 2 could fail and it will return null to signify as much other questions on any of this >> why do we still need the tempable >> why do we still need the temp variable for the same reasons as before because if we just say list equals reallock and something does go wrong. Realloc by definition will return null but not touch the original memory which case we have now lost track of where that original chunk of memory is. So we can never go back to it to print it to change it to free it. So we have to use this temporary variable here. Good question. Other questions? Yeah. >> Is there a reason? Is there a reason that we free list instead of temp? Uh, so let me So down here or further down? Okay, so further down, let me scroll down to where we came from. So here after we've added this fourth value to temp, I've gone ahead and freed list, which at this point in the story is still pointing to the original chunk of memory, the 1 2 3. Then I am updating list as a variable to point to the new chunk of memory. Then I'm doing my thing by printing out all of the integers therein. Then I am freeing what list is then pointing to. So I'm not technically freeing the same address in memory multiple times because I'm in the intervening time moving what list is pointing to. >> Absolutely yes. it would be correct to go ahead down here and just say temp because temp is still in scope. It's still pointing at the same thing. I would just argue that that's semantically wrong because at this point in the code really list is the variable you care about. Temp was really meant to be a throwaway temporary variable and you're asking for trouble if you use a temporary variable later than you the programmer intended. And if a colleague did that too, who knows what you've done with the temp variable in the meantime. Good questions. Yeah, in front Real always goes for the like memory space right after your original place. >> Correct. Realloc will try to give you more memory in the same location as before if there's room at the end. >> The code we made earlier originally instead of realloc >> so realloc will two potential things for you. So if the computer's memory looks like this, you're sort of out of luck because realo can't give you this bite. However, if it finds like four bytes down here, for instance, realloc will not only allocate those four bytes for you, it will then copy the data for you over to it, which is wonderful because it just means we don't need an extra for loop all the time we do this. Yeah, in front. >> How does it know how much data? >> How does it know how much data to >> copy? >> Uh because how much how does the how does real know how much data to copy? Because the operating system and you can think of it as the standard library stdlib.h keeps track of what memory has been allocated for you in the past. So when you pass in that same address, it knows it has essentially a lookup table, a dictionary if you will, that tells it what memory has been allocated already. So you don't have to worry about that. >> Yeah. In front. >> Good question. In other programming languages, you don't always have to declare the length of an array. Case in point, Python coming next week. That is because someone else who invented that programming language wrote all of this kind of code for you. And indeed, that's one of the goals with our transition between weeks five and six is to demonstrate that all of these problems are still being solved, just not by you and not by me anymore. We're standing on the shoulders of other smart people who have invented not just new code, but like a new language and a new compiler, or as we'll see, an interpreter for it so that we can hide all of these lower level details. Because honestly, as you can see already, like this is an annoying number of lines of code just to have a conversation about the numbers 1 2 3 4. In Python, we could reduce this code to like two lines of code, one line of code. It's going to be fun. All right, so with that said, the uh among the goals here was to demonstrate that there are a bunch of ways in which we can implement these data types, but let's talk more concretely about what we'll call data structures, which are concrete definitions of how you use the computer's memory to lay stuff out in memory. and using data structures, you can implement stacks and cues and dictionaries and all of these other things. So, we're going to put into your toolkit today a whole bunch of canonical data structures that like every computer scientist does and should know that you necess won't necessarily implement all of the time yourself. But when you use some feature of Python or Java or C++ or some other language, you are choosing among typically implementations of these data structures that someone else has written the code for so that you can just benefit from the functionality and the features thereof like that FIFO property we talked about or LIFO without having to get into the weeds too much yourself. So when it comes to data structures, let's consider that we have at our disposal now a few new pieces of syntax in C and we're going to add just one more today. We saw last week that we have the strruct keyword and we've seen that for a few weeks now. Whenever we want to invent our own data structure, we can use literally strruct. We saw in the past that you can use the dot operator to actually go inside of a structure to get at someone a person's name or their number. And we saw last week the star operator for dreferencing a pointer, dreferencering an address to actually go somewhere like inside of a structure wonderfully. Today we're going to see that you can actually in some cases combine the dot and the asterisk into a single operator with two characters that literally looks like an arrow and that will help reflect the yellow and black drawings that we've done over the past couple of weeks where we have an arrow on the screen pointing somewhere. This literal arrow in code is going to line up with that same concept. So let's introduce the first of our alternatives to arrays. An array again is a contiguous chunk of memory where the values are back to back to back. Among the upsides so fast because like all the data is right there. We've seen since week zero, you can do binary search and just jump around randomly by just doing simple arithmetic to go to the middle the middle of the middle by just dividing by two a couple of times and rounding as needed. But the problem with arrays to be clear is that they are statically uh they are statically all allocated to be a specific size maybe three maybe four but it is a finite value which is problematic because look at all the code we had to write just to resize these things again and again. Well, what if we sort of try to preempt that kind of pain and try to just build up a list by linking it together no matter where the values actually are in memory and move away from this constraint that everything has to be contiguous. After all, as I said a moment ago, if the computer has plenty of memory here, here, here, here, that to collectively is more than enough memory, but none of those individual chunks is quite as big as you need for an array. Well, heck, let's at least try to leverage all of the available memory and stitch together the data structure as opposed to really holding firm this constraint that the array be back to back to back and contiguous. So, a linked list is something you can now build using that syntax from last week and a bit more today in your same canvas of memory. So, that for the sake of discussion, suppose that we want to store first in our list the number one. Well, we all know already that it might very well exist at an address like ox123 for the sake of discussion, but it's somewhere there. Suppose that you want to store a second value in memory, but you didn't think about it initially and so you weren't smart enough to put it like right next to the one and then the next value next to that, but you know somehow from maloc or similar functions that you could put the number two over here at address ox456 for the sake of discussion and similarly there's room for the number three over here at say address ox789. So already we have a list of values in memory, but because they're not continuous, you can't just do some trivial plus+ trick to go from one to the other because they're differing numbers of bytes apart. They're not just backto back one bite. So what if we try to solve that problem in the following way? Instead of just using one bite for each of these values, let me waste a little bit of memory or spend a little bit of memory and have some metadata associated with our data. So data is value or values you care about. Metadata is data that helps you maintain the data you care about. So let me propose that we use two chunks of memory for every value such that the top of each of those chunks represents the actual var you we care about 1 2 and three respectively. And you can perhaps see where this is going. The second chunk of memory that I've allocated to each of these values could perhaps be a pointer to the next one. A pointer to the next one. And if this is the end, we can put our old friend o x0 aka null and just treat that as the end of the list implicitly. So even though these things could be anywhere in memory, by just storing with each value the address of the next value in memory, creating effectively a treasure map or breadcrumbs, however you want to think of it metaphorically, we can get from one node to the other. And indeed, that's going to be a term of art we start using. A node is just a generic structure that contains data and metadata usually like the number you care about and a pointer to the next such node. Um these are not to scale as an aside. This is typically four bytes. A pointer as we've discussed is technically eight bytes but it just looks prettier to draw them as simple squares on the screen. So what does this really mean? Well, who really cares about ox 1 2 3 4 5 6 7 8 9. We can really think of this actually as being more of a picture with arrows. But to keep track of this list of three values, I do propose that we're going to need one additional value over here. And it's deliberately just a single square because to keep track of this list of three values, I'm going to use just one variable called say list and store in that variable a pointer as we defined it last week, the address of the first node. Why? Because the first node can then get me to the second. The second node can then get me to the third and so forth. So what's the upside now? If I want a fourth value somewhere on the screen, I could put it here, here, here, here, wherever there's enough room and just make sure that I update the arrow to point to that next chunk. Update the arrow to point to the next chunk. There's no copying of data. 1 2 and three can stay there now forever until the program quits and we do actually free it. But we can just keep adding adding adding or growing this data structure in memory. So that is what the world knows as a linked list. In Python to which you were essentially alluding um a list in Python is indeed a linked list. Other languages call these vectors but they are essentially arrays that can be grown and shrunken automatically effectively without you having to worry quite as much about it. So how does the code for implementing something like this work? Well, let me propose that we have this familiar friend of a person, which we claimed in past weeks has a name and a number associated with them. We know from last week that strings are not technically a thing in C as a keyword. So that's technically just char star name and number, but same idea otherwise. And this is what we defined in the past as a person. So this is a structure we've seen before. I now need to implement the code equivalent of these rectangles, each of which has an integer and then a pointer to the next such value. So let me propose that we delete what's inside this structure, change the name from person to node, which again is a generic term for a container of values, and let me propose that inside of this new node structure, we put literally an int for the number we care about. There's going to be my 1 2 3 or four. And then and this is a little bit new. Let's include in this structure a pointer to the next such node. It's a pointer in the sense that it's an arrow. It's the address of the next node. So that's why we say node star. I could call it anything I want, but semantically calling it next makes perfect sense because it's the next such node. But this isn't quite right. For annoying technical reasons, I need to do one other thing here. I need to technically and we've not done this before put the name give the a temporary name to this structure if you will. So literally say strruct node here even though I've already said node here. Why? Because I technically need to change this line to say strruct node star. Long story short why is this necessary? Well recall in the past C and the compiler read your code top to bottom left to right. Well if in a previous version of this code we use the word node here but the compiler never sees the word node until down here. like it's just not going to compile because the word literally doesn't exist. We saw this with functions in the past. So we the solution to that was to put the prototype higher up in the file and then it would compile. Okay, you can think of this as somewhat analogous whereby if I give this structure a name on this first line even if it's redundant to this one then I can say struck node inside of these curly braces because the compiler has already seen the word node there. So just you have to do it this way. So now that we have this in code, we can kind of start playing around with actually storing these things in memory. So let me propose that we go ahead and do this by transitioning back to VS code here. And let's instead of using our array based implementation, let's implement the first of our linked lists. And I'm going to be a bit extreme and delete pretty much everything inside of main. I am for convenience now going to include the CS50 library not so much for the char star thing but because as we discussed last week it's still useful for getting ints and getting strings and other things which instead unless you use scanf are much harder and more annoying to get in C. So let's go ahead and do this um outside of main let's go ahead and invent this node called strruct node here. Then inside of my curly braces, we'll give every such node a number and every such node a pointer to the next such node. And we'll call this whole thing node by convention. Then inside of main, let's go ahead and do this one step at a time. Let me propose that to create a linked list. Initially, it's empty. So how do I represent an empty linked list? Well, I could call the variable list and set it equal to null. But what is the data type for a linked list? Well, per the picture that we had up earlier, in so far as all we need is a single pointer at far left here to represent the address of the first node in the list. I dare say all we need to say is that our list is of type node star. That is to say, what is the link list? Well, it's by definition the address of the first node in the list. So that's the first subtlety here. So that gives me a picture with no other nodes. It just gives me a single pointer initialized to null. Now let's go ahead and for par with the previous example just do something three times. So in this for loop structured exactly as before, let's go ahead and allocate a new node, ask the user for a number to put inside of it and then start stitching things together so as to achieve a picture in memory quite like this. So how am I going to do this? Well, first I need to allocate a new node. How do I do that? Well, I can use our new friend Maloc and allocate the size of a node. I want to store the address of this chunk of memory somewhere. And what I'm going to propose is that we have a temporary variable and I'll call this n which whose type is that of a node star. So what am I doing here? I'm trying to build up this list in memory so that I first have a pointer to the list. I I first have a pointer that is null pointing nowhere. no list exists. I then want to go ahead and create one new node, store value in it, and then point my list at that node. Then I want to do it again and again a total of three times. So how do we do this? We allocate space for the size of a node. However many bytes that's going to be, it's probably going to be 12 cuz it's four for the int and eight for the pointer, but who cares? Size of will answer that question for me. I'm going to store the address of this chunk of memory inside of a temporary variable called n for node and that's why it has to be node star because it's going to be pointing to an actual node. I'm going to do my quick sanity check. So if n equals equals null, we can't proceed further. I'm going to go ahead and just return one right now. So that's just sort of boilerplate code you should be in the habit of doing anytime you're using Maloc. But if all goes well, let's do this. Let's go to the address in n and then go inside of that node and change its number to be whatever the human wants it to be by using get int and just prompt the human for their favorite number. Then let's go to that same node and update the next field to equal for now null because all I want to do is allocate one new node with that number. That's it. Then I'm going to need to stitch this together further. So I'll propose that all we need do and let's clean this up first is now make sure that we string these nodes together. This syntax isn't quite right because technically because of precedence I need to drefer oops I need to uh dreference n and then go inside of it. I need to dreference n and then go inside of it. However this syntax if it's looking a little overwhelming and you have no idea now what's going on. Thankfully in C there's much simpler syntax which is this. Go to the node and go inside it to get the number. Go to the node and go inside it to get next. So the arrow notation that I promised we would now have is the same thing as using the star operator the deep reference operator parenthesizing it. Then the dot operator which is just a pain in the neck to write out all the time. I dare say n arrow number and n arrow next is just much simpler. It says go to n and point at the number field or the next field respectively. All right. So the last thing I'm going to propose we do and then we'll make this much more clear in picture form is this. Let's go ahead and prepend the node to the list. And by prepend I mean insert it at the beginning. Insert it at the beginning. Insert it at the beginning again and again. I'm going to say n next equals list. Then update the list to set equal to n. And then after all of this mess, I'm going to return zero. Okay, this was a huge amount of code, but let me give a quick recap. Then we'll paint a picture. Here is my init list initially. So the foam finger is pointing to null, which is means the list is of size zero. There's nothing there. Then I ask the computer to do this three times. Give me enough memory for a new node. Then after checking that it's not null, put the user's favorite number in it and update the next field for the moment to null. Then lastly, go ahead and prepend this brand new node to the existing list. And by preand prepend, I mean put it at the front. So n at this moment is pointing to that new node. And I'm saying, you know what, whatever the current list is, empty or otherwise, set the next pointer equal to the list, whatever that list is, and then change the list to point at this new node. So now let's do this more carefully, step by step, in picture form. So I'm going to propose that we go through some of these representative lines as follows. Here is the first line of code even without the assignment. If you just allocate a variable called list that's a pointer to a node, what you essentially has is a box of memory that looks like this. It's a garbage value though because there's no assignment operator. So who knows what's inside of this pointer. That is why in my actual code I set it equal to null which effectively creates in memory the same box but gets rid of Oscar the Grouch and puts the null value there. So we know it's not a garbage value. It's a pointer known as null. So that's what that very first line of code did in the computer's memory. The next thing I wanted to do was allocate enough memory for a node, not a node star, for a whole node. I want that whole chunk of a rectangle given to me in memory. That's going to return to me the address of the first bite thereof. And I'm going to store that in a temporary variable called n. So at this point in the story, n is going to be a pointer of its own, another box that initially sure is going to be a garbage value, but because I am using the assignment operator, it's going to point to that chunk of memory which maloc if successful presumably allocated for me in the computer's memory. So n for all intents and purposes points at that same chunk. These values are still garbage values because it's just a chunk of memory. Who knows what it's been used before? But that's why after this line of code, I took care to get an int from the user and then initialize the next pointer to null. So for instance, for the sake of discussion, let's get rid of get int for the picture and just say the human typed in the number one initially. Well, that's equivalent to putting the one in the number field by first going to the address of in n and then dreferencing it using the star and the dot notation respectively. So that means follow the arrow and then change number to the value one. Then the next line of code or rather or equivalently you can just do the same thing. And thankfully now C syntax lines up with what the pictures look like we've been drawing. Go to N follow the arrow to the number field. That's literally what the syntax is telling me. Meanwhile, if I use that same syntax again for N arrow next set it equal to null. That's like saying go to N follow the arrow and change the next field in this case to null. or we'll just blank it out to be clear. So at this point in the story, we have allocated the node. We have stored one and null. There list is still null. N is pointing to this, but the whole point of this exercise is to add this node to the list. So we need to somehow update this value, which is why ultimately I'm going to do something like list equals N. Now that seems a little weird semantically, but recall that N is a pointer. That is the address pointing at ox123 or wherever that is. So to point list at the same node, it's equivalent to setting list equal to n because then we'll effectively have an arrow identical from list pointing at that new node. And at this point, I don't even care what n is anymore. It was always meant to be a temporary value. This now is my list. So even though I did it in code already pre preemptively in a loop, the first iteration for that loop literally created this in memory. Let me pause before we go through numbers two and three for any questions because the VS Code version looks scary. This is perhaps a little more bite-sized. Okay. So, how about we do this twice more for two and three, respectively. So, again, inside of our loop, we're back to this line, which asks the operating system for enough memory for the size of a node, stores that address temporarily in a variable called n. So, here's our friend Oscar brought back onto the screen. Maybe the new chunk of memory is over there. This effectively points n at that chunk of memory. The next line of code inside of that loop that's relevant is this. And we'll get rid of get int and just pretend that I literally typed in two. We're going to go to this version of n, follow the arrow, go to the number field, and set that equal to two. The next line of code, we start at the end, follow the arrow, change the next field to null. And then same lines as before, we now need to update list equaling n. But something's about to go wrong here. If I update list to point to the same node that n is pointing at, watch what happens. I set list equal to that n because it's temporary might as well go away at this point. But what have I done wrong logically here? Yeah, >> you lost the arrow to >> Yeah, I lost the arrow to the original node. I have orphaned the first node because now nothing in my code is actually pointing at it. I've got in duplication two pointers pointing at this chunk of memory. So this thing, even though we obviously as humans can still see it, we have lost track in code of where it is, which means that is the definition of a memory leak. I can never get that back or give it back to the operating system until the program itself finally quits. So, I think I need to be a little smarter and not do this line quite like this yet. I think what I want to do, and I've rewound, so list is still pointing to the original list. N is pointing to only the new node. What I think we need to do is something like this. And this is why the code was fairly non-obvious in VS Code at first. Go to N, follow the arrow, go to the next field, and here's the cleverness. Point this pointer to the existing lists value. So if the existing list is pointing here, that just means, hey, point this to the exact same thing because now I can safely update the list to point at the same thing as n. So its arrow now points here. But even when I get rid of n, I wonderfully have the whole thing stitched together. And the metaphor I often think of is like around like Christmas time in olden times when people would like stitch popcorn together. That's what you're kind of doing with a thread here. You're trying to stitch together these nodes or popcorn kernels if you will such that one can lead you to the next can lead you to the next can lead you to the next but you can never let go of part of that strand in the process. So here now we have a list which is great because notice we haven't touched the one but we've added the two. We can go ahead in a moment and add the three but you can perhaps see where this is going. I'm kind of doing it backwards by accident but we'll get there soon. So now let's allocate a new node run through in our mind's eye all of those same steps. I'm going to hopefully end up with a list that now looks like this. And even though it's kind of long and stringy, these values could be anywhere in memory, but because of these various pointers, I can jump from one location to the other, making more efficient use of everything inside of the computer's own memory. All right, but of course, we've got this symptom that I didn't really intend whereby the whole darn thing is backwards. But I think that's kind of okay for now. But I'd like to propose that we consider how we can now maybe traverse this thing and actually print out the values in memory. So let me go ahead and do this. Let's go ahead and how about let's say let's go back to VS code here. So at this point in the story we've got the same code that implements that same idea except I'm using get int just so that I can dynamically type in the one the two and the three without having to hardcode it into the actual code. Suppose that after doing this exercise, I actually want to do something interesting like print the numbers. Well, we don't have that code yet in this version of my program. So, let's bring that back. Last time I did this just using a for loop and array notation. And I think I can do that. But let me propose first that I implement this idea pictorially. Here's the same diagram. This is what exists in the computer's memory. If I want to go ahead and print out these numbers, albeit in reverse order, let me propose that we can do this by giving ourselves another temporary variable. We'll call it ptr, pointer for short. And that's like having another foam finger that points at the start of the list. So it's not pointing at list. It points at whatever list is pointing at, which means here. Then I can print out the three pretty easily. So long as I next update pointer to point to the two, print it out. then point it to the one, print it out, and eventually I'm going to realize, oh, I'm out of nodes because the end of this list is null. So that's the idea I want to implement now logically in code. Create a temporary variable called pointer. Set it equal to whatever the list itself is. Print out the value, update the pointer, print out the value, update the pointer, print out the value, update the pointer, realize it's null, and stop. So in code, it's a relatively small loop, even though the syntax is still pretty new since we've only just started playing with memory since last week. But what I'm going to do is exactly what I proposed. I'm going to create a new pointer called ptr and set it equal to the list itself. That's like having another foam finger temporarily pointing at the first element in the list. Then what I'm going to do is say while that temporary variable is not null, go ahead and traverse the list. What do I mean by that? Well, let's go ahead and print out the current element in the list by using percent i back slashn and printing out whatever the pointer is pointing at specifically its number field. So that is follow the arrow and print out the number. Then inside of this loop, I'm going to update after doing that my temporary variable called pointer to be equal to pointer arrow next. And that will have the effect with just those few lines of code of implementing precisely this idea. I first set pointer equal to the list which happens to point here first. I then do my print f and then I update the next field rather I update pointer to be the value of pointer follow the arrow next. So if this is ox123 for instance that is what is now in oh sorry if this is ox456 that is what's now in pointer. So the arrow effectively looks there in my for loop I print out with percent i this number and then I go to the next field follow the arrow and then set it equal to rather whatever this pointer is here ox789 set it equal to the pointer there. So I effectively move the arrow there. Then lastly, I update ptr to point to the value of this next field which is null. Which means effectively pointer itself is null. Which means the for loop cleverly stops now because I was supposed to do this whole loop while pointer is not null but pointer is now null. And just as an aside, if you prefer the semantics of a for loop, there's nothing new here per se. I can do this exact same thing using a for loop simply as follows. And it's a little tighter to implement as follows. I can say for instead of int i equals z in that old approach. I can actually use pointers in a for loop like this. For node star pointer equals the start of the list. Keep doing something so long as pointer does not equal null. And on each iteration of this loop, update the pointer to equal whatever the pointer's own next field is. And then inside of this for loop print out using percent i back slashn the current pointers number field semicolon. So here is where again we see the equivalence of for loops and while loops. What you can do with one you can do with the other. This is a little more elegant in that you can express a whole lot of logic in one line of the for loop. Frankly I do think the first version is nonetheless more readable. So let me undo undo undo undo everything I just did. On the courses website you'll see both of these versions. This one's a little more pedantic as to what it's doing step by step. Okay, that two was a lot. Let me pause here to see if there are any questions. And if you're feeling like that fire hose like this is why we transition to Python where all of this now gets swept under the rug but is still happening just not by us in a week. Questions? Yeah. Yeah, really good question. So we I I here I've been preaching like we don't want to lose memory. We don't want to leak memory. And here I am fairly extravagantly now spending twice as much memory to maintain this data structure. That's going to be among the themes with all of the data structures we talk about. If we want to gain some benefit like dynamic growth and shrinking of the data structure, you got to give me something. And what you've got to give me in this case is the ability to use more space. Um, in a bit today and after break in particular, we're going to decide we'd really like these algorithms to be faster. Well, that's fine, but you're going to have to give me something in return. You're going to have to spend more space to make the code faster. And so time and space and financial cost and human time and any number of other resources are all things that you need to evaluate as a programmer or a manager and decide which is least andor most important to you. And right now I don't care about space as much as I care about the dynamism that I'm trying to solve first. Other questions on here? Yeah. >> Yes. Why am I using pointer instead of n? I Well, yes, I could reuse n at this point. I deliberately chose to use pointer for two reasons. One, I'm using it for different reasons here. Um, two, it's not necessarily the best idea to use one variable here for a specific purpose and then reuse the name down here besides it's out of scope at this point anyway. Um, so it just makes me feel better that I have different variables doing different things, but it would not break if I did it your way. Other questions? Yeah. And back >> are pointers temporary? Not necessarily. Like the linked list we are building up in memory exists because we are using pointers to build this data structure and to keep it intact for as long as the program is running. My temporary variables n and pointer ptr in this case those are ephemeral and I'm only using them to kind of stitch things together temporarily. A good question. All right. So let's now motivate why we're spending so much time sort of stitching these things together so carefully. Well, here's our little cheat sheet of common but not exhaustive running times. Let's consider what the running time is for some fairly basic operations like inserting a number into a linked list, maybe searching for a number in a link list or traversing it uh and also deleting ultimately numbers in a linked list. So here is my list initially completely empty. And suppose I go ahead and insert the one, then I insert the two, then I insert the three using code like we just wrote. I love this approach because even though it looks a little scary at first, this is probably the simplest way to implement insertion into a linked list. Why? Because I'm just constantly prepending the next element. Prepending, prepending, which means all of my hard work is just here at the beginning of the list. So even if this thing has a thousand elements in it, I'm only manipulating some pointers all the way over here pictorially at the left, which means it's pretty darn fast. So given that definition in this picture, what would you say the big O running time is of insertion into a link list when using my current implementation? >> Big O of one. Why? Well, it's not literally one step, but it is a constant number of steps because if we literally counted the lines of code I was executing, it's a a few steps to sort of point one thing up here, point the other thing down here, then update the third, and boom, we're done. In particular, what my current code does not care about is the whole length of this list. Why? Because I'm never traversing the whole thing for the insertion part. I am obviously for the printing part, but for the insertion, I'm just prepending again and again. The downside though of this approach is that the whole darn thing is coming out backwards. I'm not doing anything with regard to the ordering of these elements, which means what's the running time of search going to be? For instance, if I tell you search for like the number one, find it for me. What's the running time going to be there in big O? Big O of yeah, big O of N because in the worst case, it's going to be all the way at the end. And we've seen this scenario before. So, it's big O of N for searching. It's definitely big O of N for traversing or printing. But that goes without saying. If you want to print every element, obviously you have to touch every one of the N elements. But what about deletion? Suppose I want to delete an element. That's going to be in big O of >> N. >> Also N. Why? Because again in the worst case it could be all the way at the end. So only insertion as currently implemented is bigo of one because we are exercising full control over where the new elements go irrespective of what the actual values are. So things could escalate quickly here if we do actually want to start keeping things say in sorted order because we can no longer just naively plop things at the very beginning of the list. I think we need to start being a little more careful as to where we put things. So in fact, even though we're doing okay on insert right now, we still have big O of N for the searching and for the deletion, which we won't do in code, um as well as of course for traversal. So how else might we go about building this list? Well, let me propose that we could maybe append to the end of the list. Let's try that and see if it gets us anywhere better. So here's my list initially, completely empty, aka null. I go ahead and insert the number one as before, but now in this algorithm I'm going to insert the number two and the number three. So this is great because now by chance it ended up beautifully in order. But that's because I chose the numbers 1 2 3. But we'll come back to that detail. Let's consider now what the running time is of this algorithm of insertion using appending to the list. What's the big O not big O running time of insertion now? Big O of N. So it's sort of strictly worse because now it's always going at the end. Now I could be a little smart about it. I could just allocate another pointer and just always have another pointer pointing at the end of the list just as I have a pointer pointing to the start of the list. That's totally fine if you're willing to spend one more pointer which is a drop in the bucket. A legitimate solution. But where I'd like to go with this is let's maintain sorted order no matter the order in which the numbers are inserted. Whether it's 1 2 3 3 2 1 213 312 whatever order the human types in the numbers I want to build the structure out such that they always end up in sorted order just so that my contacts in my iPhone or my Android phone for instance are sorted as intended. So how do we go about doing that? Well here we're still dealing with some big O. Let's try this. Here's my list initially empty. Now we the user inserts person number two first. So it ends up there. Then they insert number one. I'd like it to go there. person number four, it goes over there. And then person number three, it ends up here. Even though it's sort of obvious with a piece of paper and pencil how to stitch this together, this is now an annoying number of logical steps because there are so many opportunities where I could screw up and orphan one or more of these nodes. But let's consider the scenarios that might we encount we might encounter. Maybe we get lucky and it's like an empty list and we just have to insert one new node. That is trivial. We've done that already. The two was super easy to implement. The one could be really easy to implement too because that involves the prepending scenario and we've seen that prepending is super simple. So there's only two other scenarios to consider appending if it's a really big number and ends up at the end and we've talked about but haven't seen code for that. The annoying one I dare say is going to be when the new number belongs in the middle. But I propose to think through it this way because now you just have four problems to solve not just one massive illdefined problem. You've got scenarios in which you want to insert a new node into an empty list. you want to prepend the new node into the beginning of the list, append it to the end of the list or somewhere in the middle. So that's like four blocks of code in my program. I can now sort of take the proverbial baby steps and implement this bit by bit. And to do this, let me propose that in a moment I'll switch over to VS Code, but uh sort of Julia Child style, I'm going to open up a pre-made version of the program that actually gives us a working solution, albeit initially with some bugs. So here we have out of the oven this version of list C at the top of the file I've got my same includes as before I've got my same structure as before here I've again got in main void I've got the beginning of my list here setting it equal to null and then for the sake of discussion I'm going to insert three values for this example 1 2 and three by allocating enough room for a node setting it equal to n then I'm going to make sure a sanity check that n is not null and then I'm going to populate this with the human's first choice of values. So, let me scroll down. But as such, there's nothing too new just yet. Here we have the lines of code in which I'm getting an int from the user, setting next equal to null, and then I'm prepending no matter what per our earlier version that we did on the fly this new node to the list and then updating the list to point to it. And then down here, I'm printing the number. So, this is where we left off, but this is a pre-made version that's nicely commented. It's on the courses website for reference. What I'm not doing now is intelligently prepending, appending, or plopping the code in the middle. So, how do we do that? Let's take a look at this version of the code. So, everything thus far is the same. And if I scroll down besides the new comments, you'll see that now I'm starting to make some decisions after I have allocated the new node and populated its number and next field. As an aside, I don't strictly need to initialize the next field to null because eventually, as we've done in every past example, I've updated that next field anyway. However, because this one might now end up at the end of the list, and I just want to program defensively, initializing pointers to null before you're ready to assign their value is a good thing in general. So, here's the first of the questions I'm going to ask myself. If the list into which I am inserting this new node is empty, so it's the beginning of the story. Super easy. Just set the list equal to the address of that new node, and we're done. That's what happened when I inserted a bit ago the number two for the very first time. So indeed what has just happened here is that now the list previously empty contains only a node containing two. However, thereafter there was another scenario. So when we moved on in our story and added the number one to the list, well that happened to end up at the beginning but it could also end up at the end or in the middle. So let's break down those scenarios here too. So here if it is not the case that the list is empty in that if condition we're going to end up here now in the else. What do I want to do here? Well let's go ahead and for now in this simplified version append it to the end of the list so we can see that code. How do I do this? Well I'm using a for loop much like the one I had before which just allows me to traverse the existing list whether it has one node or many. And I'm gonna ask a question. If following the current nodes pointer field, next field leads me to null, aka the end of the list. Okay, let's go ahead and update the end of the list to actually equal the new node. So in other words, if I'm sort of following following following all of the arrows and I reach a node whose next field is null, no problem. Update that next field to point to the new node I want to insert. Irrespective of the values, I just want to append this node. no matter what. And then I want to break out of the code. Then at the bottom of this version of the program, it's all quite the same, printing out the numbers using the for loop version of my code from before instead of the while loop, but they're equivalent. But what I did do in advance in baking this version of the program is also go through the motions of freeing every one of the nodes afterward, but we'll come back to that. So this version of the code, just to be clear, only appends nodes to the list. It's still not treating things in order. But we've now seen two of the scenarios plucked off. The list is empty or it has numbers and we want to put something at the end. So let me propose now that I take out of uh our distribution code another version of this program that does that and a bit more. I'm going to go ahead and open up in just a moment a new and improved version of list.c. And now it looks almost the same at the top. Scrolling down. Scrolling down. Scrolling down, here's some now familiar code. If the list is empty, do that simple thing as before and just prepend it. Uh rather just set it equal to the list. But here is now where we're adding some inequality. So if the number in question belongs at the beginning of the list. So if the number in the new node n is less than the number in the current list which is presumed to be the first node at the moment then go ahead and update the new node's next field to point at the existing list and then update the list to point at this new node thereby giving us from two in the list to one and two in the list. To be clear, if I go back to VS Code here, what's happened here is because one is less than two, of course, I'm going to update the new nodes next field to point to the list. What does this mean? Well, the new node at this point in the story is the new node for the number one because that's the second thing we're inserting. I'm going to update its next field to be whatever the list a moment ago was already pointing at. So this is the after effect but a moment ago list was pointing at only the two. So now the next field of the one points at the two and then lastly here in this line I update the list pointer to be the address of that new node. And here's where I'll wave my hand a little bit today because it starts to escalate quickly. It's useful and it might very well be useful for problem set five in particular, but I think more healthily reviewed step by step at a slower pace. Here is where I'm asking myself, all right, if it's not the only element in the list and it doesn't belong at the beginning of the list, well, it belongs somewhere later in the list, which gives me two final scenarios. Let's figure out which scenario we're in. Let's use this for loop to iterate over all of the as as many of the nodes in the list as we need to. If we get all the way to the end, because our pointer variable now equals null, it's like following the arrows, following the arrows, and maybe we're trying to insert the number five. I've already hit the number four. I've hit null. five belongs at the end. So here we have our promised append code which is exactly the same as before but now I'm doing it conditionally if I've indeed found my way to the end of the list. And then lastly, let me scroll down just a little bit. If it's not the case that the list is empty and it's not the case that the new node belongs at the beginning and it's not the case that the new node belongs at the end, I'm just somewhere in the middle of the list because the new number I'm inserting is less than the one I'm looking at here. And it's okay to use two arrows, but I'll wave my hands at that for now. These three lines, two pointer manipulations and a break is what's going to stitch together that three in between the two and the four. And let me propose for lecture sake, take this on faith that this collectively does stitch things together properly. But I do think as you'll see in problem set five, it's a much better exercise to think through a little more carefully step by step because there's just a lot of fine-tuning of these pointers together and the order of operations does matter. But at the very end of this program, notice this is kind of mindless even though the syntax is undoubtedly less familiar. Here is how just like traversing the whole list to print it out, we can similarly do one more pass over the linked list and free every one of the nodes. But notice it's not quite as simple as just saying free the whole list. Free is not that smart. Maloc is not that smart. And even though you have called maloc one, two, three times, you have to really call free. You have to call free one, two, three times. You can't just pass at the beginning of the link list and say you figure out what to delete cuz it has no idea what a linked list is or what your data structure actually is. So the reason that this loop is a little complicated is that what I'm doing with these three lines is essentially traversing my list and making sure that I have a pointer that when I'm ready to delete the three, the one, I have a pointer pointing at the two and then I free the one. I update my pointer to point at the three and then I delete the two. I update my pointer to point at the four, then I delete the three, and then I delete the four. So, there's a bit of trickery involved in making sure you don't orphan things step by step. [sighs] Okay, that was a lot. Let me pause here to see if there are in fact any questions, even though we're deliberately waving our hands at some of those details. Questions on this? Now, let me add one final flourish. If we were to really quibble over this, I mean, my god, we're up to 80 lines of code already just to implement the numbers one, two, three, four. But there are some subtle bugs in here at the moment. So, for instance, suppose that something goes wrong with maloc inside of this for loop here. And suppose that it's not your first iteration, something goes wrong on maybe the second or the third iteration. Why is this error check suddenly bad as I've implemented it? Yeah, I didn't free the memory from the previous iteration. So this is where like oh like memory management starts to get really annoying because if you do want to practice what I've been preaching which is free any memory you've allocated and you've already allocated one maybe two nodes because maloc is again failing maybe at the last iteration here you have to somehow go back and free all of that and that's fine like we have code at the bottom of my file here which could traverse through the existing list and just free it all. So I could just copy paste that code, put it into my if condition and then run that code too to delete the whole list. But at this point if you're copying and pasting you're probably doing something wrong. And so let me propose as a final version of this just for your reference later in the ninth and final in version nine of this file here zero indexed what we have. Give me one second to just make a quick copy and copy it over in list 9. see our last version of this. We have the following whereby now in my function uh in my main function I have the exact same code as before but I've taken the liberty of implementing an unload function so that I can call it here as well as at the bottom of this main function. So I can unload it here or unload the list there. And all I've done now is in good form in terms of design just implement the notion of deleting a linked list in its own function. So I could call it any number of times from any number of places. But just so you've seen how I might do that there. All right. So let's ask the question after all of this. What is the running time of inserting into a linked list? Big O of say a little big O of >> N. Damn it. Like that's no better. All right. What's the running time of searching a link list? >> Big O of N. Damn it. Uh what's the running time of deleting from a link list? >> Big O of N. So like everything is literally big O of N. So there's the price we've suddenly paid. We have an hour after we started with arrays gotten to the point where we can dynamically grow in a linked list and I dare say even though we've not done it and won't do it today, shrink the link list by freeing things that we don't need. So we have the dynamism and we can make more efficient use of memory even if it's very fragmented and there's a few bytes here a few bytes there but we've paid this price because with arrays recall even our phone book example we at least had binary search the running time for which was big O of log so my god not only are we spending more space the darn thing is slower surely this is not how our phone contacts are implemented surely this is not how stacks and cues are always implemented and indeed it's not this is just going to be a stepping stone to now doing a sort of mashup of data structures whereby we take the best features of arrays, the best features of link list, mash them together to get new and improved data structures. But for that, we're going to have to have some cookies first and we'll come back in 10 minutes. Cookies are now served. All right, we are back. So, let's recap how we got here and why. So, we started with our old friends arrays, which we introduced in week two. And recall that the whole appeal of arrays was that one, as all things go, like relatively simple, certainly now in retrospect, but more importantly, they were really darn fast. Like arrays in so far as they are stored backtoback contiguous in memory means that we could do very simple arithmetic recall to like fi figure out the length of it and then divide by two to get the middle divide by two again to get the middle of the middle and so forth. And even though we might have to deal with a little bit of rounding arrays lent themselves to binary search and thus logarithmic time so big O of login. But today I claim that the downside of arrays is that you have to decide in advance how big you want it to be. And if you guess wrong and it's too small how much uh memory you ask for, you then have to reallocate memory. And that's fine. It's solvable with maloc or realloclock. But it's going to take some amount of time to copy all of the old memory into the new memory. Whether you do it with a for loop or mal realloclock does it for you. Meanwhile, we only did it with like three values, maybe four. But imagine it being 3 million values that you now need to allocate more space for. You're going to waste a huge amount of time copying 3 million values from the old location to the new. And so that's just generally not very appealing. And so that motivated our whole discussion of linked lists whereby now we can create a more dynamic data structure whereby we only allocate memory as we need it. So we don't have to worry about underestimating or overestimating and therefore wasting memory. We can just go bit by bit for each new value. We allocate another node, another chunk of memory, and the thing just grows and grows and grows. But as we saw just before break, the downside is even though we're avoiding the inefficiency of having to move stuff around in memory, once allocated, the nodes can stay where they are and we just update our pointers. All of our running times for searching, inserting new elements, deleting old elements would seem to be big O of N. But why was that? Well, in the context of a linked list, recall that it might look a little something like this, whereby we have a pointer called list pointing to maybe four values like this. And suppose that we do want to uh search for a value. Now, it's nice because in our latest version of this linked list, it was sorted from smallest to largest. And that was always a precondition of doing binary search. But even though it's obvious to our human eyes where the middle is, it's like roughly over there. How is the computer going to figure that out? is how is your code that you write? Well, unfortunately, the way we've stitched a link list together with these pointers is if you want to find the middle, you can, but you got to start at the beginning, traverse the whole thing to figure out how long it is, then do it again, and stop halfway through once you know what the halfway point roughly is. Then, if you want to search the middle of the middle, you've essentially got to do that whole process again. And so, now just to use binary search, you need to spend big O of N steps just to even find the middle. Now, if your mind is kind of spinning and you're like, well, maybe I could just kind of cheat and use a pointer to always point to the middle of the list. Totally fine. You can spend in some additional space to remember the be the middle of the list, the end of the list. But where does that stop? What if with binary search, you go not just to the middle, but the middle of the middle, the middle of the middle of the middle, the middle? Are you going to keep around a pointer to every element? Because if you do, you're essentially back to an array if you've got one location for every other location. So it just kind of devolves into a mess. Even though there's some minor optimizations we could in fact make. In fact, we didn't talk about it yet. But one common alternative to a singly linked list, which ours is, it's linked with a single pointer from node to node. Uh computer scientists also like to talk about doubly linked lists where there's arrows going both directions, which actually would have simplified some of the last code that we looked at because I don't have to look ahead to figure out what I want to free or what and where I want to insert some value. But that too doesn't fundamentally change the speed. It just makes your code a little easier to write. So in short, with link list, we get dynamism. We can now grow and shrink things without wasting time copying. But we've lost hold of our binary search. And that was very appealing as far back as week zero when we wanted to do something quite quickly. So let's see if we can't make some mashups now. take some arrays, take some link lists, literally mash them together into a sort of Frankenstein data structure and see if we can't get some of the speed of arrays, but the dynamism of linked lists. And so I give you trees. If you think about in your mind's eye what a family tree looks like where you typically have some parents and then some children and some grandchildren and so forth. It's this sort of treelike structure even though by convention it's drawn top down instead of bottom up like trees in the real world. But the top of that family tree uh we're going to call the root of the tree. It just so happens to indeed grow down. But a tree is a very common data structure and it's interesting visav arrays and link lists in that it's the first of our two-dimensional data structures. An array is effectively just a single dimension along from left to right. A link list is essentially the same. Even though in reality it might be up, down, left, and right in memory. It's still just one thing stitched together in a single dimension. A tree adds now a second dimension. And specifically useful for us is what we're going to call binary search trees, which is spoiler going to give us back the ability to use binary search. But we're going to store the data a little more cleverly than in arrays alone. Instead of storing our data in one dimension in a binary search tree, we're going to store in effect in two different dimensions. And that's going to gain us some speed. So here for instance is an array of seven numbers as we might have seen it back in week uh two when we first introduced arrays. Let me draw our attention to the middle element and then to the middle of the middles and then the middles of the middles of the middles just by color coding them slightly differently. If I were to run binary search on these numbers or the lockers that we had on the stage a few weeks back, I would jump to the middle then the middle of the middle and so forth. The catch though is that implementing it as an array, it's not going to be very easy to add new values. Why? Because if I want to add the number eight or nine or 10, I might get lucky and there might be room in memory here, but I might get unlucky. In which case then we got to start jumping through those hoops of maloc or realloclock and all and and copying all of this memory to a new location which is doable. We solved it in code but it's going to be slow for larger data sets. So can we avoid that? Well maybe I deliberately colorcoded things like this because let me propose that instead of storing these seven values in an array, let's store them in a family treel like structure like this where I just kind of exploded them vertically on the y-axis here. So now the middle element, the fours at the top of this tree. The four, the two and the six which were the middle elements after the middle are going to be to the left and right of the four. And then these leaf nodes so to speak. We borrow a lot of vernacular from the world of actual trees. These are leaves in the sense that they themselves have no children. They're at the edge of the data structure are going to be the middles of the middles of the middles. But all of the data is still there. I've just exploded it from one to two dimensions. And let me propose that now that we have this technique of using pointers which we use with CC code but you can depict them pictorially with arrows. Let me propose that we stitch together these seven values in memory using a bunch of pointers whereby now each of these nodes drawn as a single uh square for simplicity is going to have not only an integer associated with it and not just one pointer but per these arrows as many as two arrows associated with it. So our nodes are about to go from data structures with two things, a number and a pointer to three things, a number and two pointers for the left and right child respectively. And I dare say now that we have a two-dimensional tree data structure, consider how you might find a number therein. Suppose I'm searching for the number five. Well, I start at the root of the data structure. And even though our human eyes obviously know where we're going, notice what's important about this binary search tree. If I go to the root of the no of the tree, I see the four. Four is obviously less than five. What does this mean? This means I can divide and conquer the problem right off the bat. I know that five is going to be to the right of this node, which means effectively, if you think in your mind's eye about snipping the branch there, I have just haved the problem essentially like dividing the phone book in half. Why? Because I don't even waste time looking at this subtree, the left child of the four element. Meanwhile, if I go from the root to its right child here, I see the number six. Five, of course, is less than six. So, this is effectively like snipping off that child because I don't need to go further there because I know a smaller element is going to be in this direction. And that's the key property of a binary search tree. It's not just a family tree with numbers all over the place. They follow a certain pattern. every element is going to be greater than its left child and less than its right child assuming you don't have identical values and that property is actually a recursive one to borrow terminology from a couple of weeks back recall that a recursive function is one that calls itself a recursive data structure like the pyramid in Mario is a data structure that can be defined in terms of itself well binary search tree is a recursive property in so far as if it applies to this node it also applies to this node case point two is greater than one but it's also less than three. It's true over here. Six is greater than five but less than seven. And it's technically true of the leaf nodes because the definition is at least not violated there because they don't even have children themselves. So this is a binary search tree because of that pattern. So this then invites the question, well how long does it take us to search for a value in a binary search tree? Well, if the number is five, it's going to take me one two steps. But if there's n elements here, can someone want to generalize that either mathematically or just instinctively? Big O of log n. And even if you're not quite sure how the math works out, anytime you take a data set and you have it, have it have it, we're talking about log base 2 of n again. And indeed, that's going to describe the height of this tree. The height of this tree is essentially log base 2 of n because if n is seven, it's going to give me uh essentially two when we round appropriately. If we round up, if we've got eight elements, log base 2 of 8 2 the 3r. So that means three. So 1 2 3. It kind of works out even if I'm doing that a bit quickly. The height of this tree is log base 2 of n aka bigo of login. How long does it take to insert? I think it's going to take login because I can insert over here or over here or over here depending on where the number goes. Uh how long does it take to delete? I'll claim it's going to take about the same. So wow, we're back in business. I've got now the ability to grow and shrink my data structure because if I want to insert the number eight, it's going to go right there. If I want to insert the number like 5.5, I I can see where I would put it. It's going to be easy to add new nodes by just updating the pointers without copying everything in memory like we had to for arrays. But there is a downside here. I got to concede something. What am I what price am I paying? What's the trade-off here to gain that dynamism and that speed? But >> each individual node takes more memory. >> Yeah, I'm literally using three times as much memory now because even though it's not depicted here explicitly, each of these squares represents an integer and a pointer and another pointer. So that's like 16, that's like 20 bytes at this point of memory instead of just four bytes for each of the integers in an array. Nowadays though, space is pretty cheap. We all have very large Dropbox folders, iCloud folders, and the like. So it's not really a big deal to use that many more bytes. Certainly not a big deal for seven numbers, but if it's seven million numbers, maybe this isn't the best data structure to use, even if speed is important. You got to decide ultimately based on your actual use case what matters more. So in short, a binary search tree you can kind of think of as an amalgam of or rather a variant of a linked list except that every node has as many as two pointers instead of one, which is what gives us now this this second dimension. And in fact, this translates pretty nicely to code. In fact, if we consider how we implemented in a linked list a node, recall that it looked like this where you got a number in each node and a pointer to the next element in the linked list. Well, I think for a binary search tree, we can sort of borrow this as inspiration, make a little more room because we need two pointers instead of one. And I'm just going to call the left child the left pointer and the right pointer. But here is the three times as much space give or take because I now have three elements associated. Two pieces of metadata and one piece of data that I actually care about to stitch this thing here together. All right. Well, if this is the data structure there, how could I implement this in code? Well, here's where recursion again comes into play. The fact that a binary search tree is recursive in nature in that what you say about this node about it being greater than the left child and less than the right child can be said of this node and this node and this node and this node. You can leverage that beautifully in code like this. So suppose I'm implementing a search function in C whose purpose in life is just to say yes or no true or false the number you're looking for is in this tree which might be a useful thing to uh check uh in a in an algorithm. Search is going to take two arguments. I propose the number you're searching for and a pointer to the tree. That is the root of the tree initially. So how do you actually traverse this thing in C code? Well, we can pluck off the the easy case first. The base case if the tree itself is null. Like if you hand me nothing, I'll give you your answer right now. False. Like there's no number here if the tree is empty. So that's easy. Otherwise, if the number you're looking for is less than the number in the current node. So tree is what's passed in a pointer to the root. So if you follow the arrow, you can get inside of that value and see its number. If the number you're looking for is less than that, okay, you want to what? Snip off the right tree and dive down the left subree. So you search the trees left child for the same number. Else, if the number you're looking for is greater than that number, you search for the trees right child for that same number. And the fourth and final scenario is what? Well, if the number you're looking for equals the number in the current node, you got it. Return true. And if you're uh recall some of our past design discussions, this is sort of a waste of everyone's time to ask this question explicitly. Let me tighten this up design-wise because there's only four possible scenarios. Either there's nothing there, it's to the left, it's to the right, or you found it. It's right there. So whether or not you agree at this point in your programming career, like there is a beauty to this code that most programmers would claim is here and that it's so relatively elegant whereby you've defined what the function is. You've got this base case which is arguably one of the clunkiest parts. But the fact that you can just check a value here and then traverse the exact same structure but a subset of it by traversing the left subree or the right subree is like a beautiful application of recursion. And it allows you to uh search for this thing no matter where it is in the computer's memory. Questions then on this idea of a binary search tree or this actual code thereof. >> And if you don't ask the question, if the number is not there, >> uh, nope. If the number is not there, we recall. So, if we get all the way to the bottom of the tree such that now I'm at one of those leaf nodes and that's not the number I'm looking for, such that there's no left child left, no right child left, this conditional is going to kick in and I'm going to return false. But if I find it along the way, whether it's at the top of the tree or somewhere in the middle or among the leaves, I will eventually return true. Good question. And to be clear, even though I'm calling this a tree, that's true certainly for the first time I call this function because I'm passing in a pointer to the whole tree structure. But if you think about it, what's the left subree and the right subree? It's just a smaller tree. It's like a baby tree that's attached to this parent node, so to speak. So it's perfectly reasonable to just call the search function with that child because it in turn has a whole subree below it or the right child which has the whole subree below it instead. All right. So I like this direction. We've now kind of improved upon link list. We've gained back some of our performance because we can now find something with big O of log and time. I don't love the fact that I'm using three times as much memory roughly. That feels like kind of a high price to pay just to speed things back up. But let's consider whether or not this thing is actually going to work as the data structure gets bigger and bigger as well. So it looks beautiful here as written and that's deliberate because I drew the picture like this and it's got seven elements in it. But how did we get to seven elements? Let's start from the beginning. Suppose that the tree is initially empty and suppose that a human using get int or some other technique inserts the first element into the list like the number two and the goal is to maintain the binary search tree property which means you got to have it greater than left child less than the right child. So suppose the human using get int or some other technique next gives me the number one no big deal I plop it right there as the left child suppose they give me the number three next no big deal it goes right there I have very deliberately manipulated this story to work out beautifully such that the tree is smaller but it's still a binary search tree and nicely balanced so to speak but what if the user for whatever reason just gives me a more perverse sequence of inputs like the worst case scenario to give me three elements and suppose they give me one first Okay, that's the root. Then they give me two. Okay, that's cool. That's like the right child. But what if they then give me three? Well, to maintain that binary search property, the three has to go over here. Suppose perversely then they didn't give me four, then five, then six. Imagine in your mind's eye where this story is going. What have I accidentally created in memory? Then a link list, which is like bad for all the reasons we discussed before the break because even though we're getting the dynamism, it's devolving into big O of N. So I've kind of manipulated the situation here with their original example with seven seven elements and then three elements by making sure that they were inserted in just the right order. Because unless you are clever about how you build the tree in memory, it could very well devolve from a tree in two dimensions into actually a linked list in one dimension. And now this is just a long and stringy tree that does not violate the binary search tree definition, but it is surely not balanced in this case. Now, as an aside, if you take higher level languages and data structures and algorithms, there's many different alternatives to binary search trees that actually have baked into the algorithms a little bit of rejiggering of the structure so that really as soon as you insert this three, you spend a little bit more time and clean the situation up. And essentially what you do is like pivot the thing around this way so that two becomes the new route and then one hangs off of it and three still hangs off of it. So with each insertion or deletion, you rebalance the tree as needed, which does cost you a bit more time, but it avoids the thing devolving into big O of N again. And we won't do that in code. So this is recoverable, but not if you implement it naively, as I did, at least verbally in this story. All right. Well, can we do better than that? Well, why might we want to? Well, at this point in the story, it certainly could devolve into big O of N, and that's not great. Certainly for large data sets, it's nice that we're back to login. At least if you take on faith that we could kind of rebalance this thing as needed and maintain a logarithmic height for it. But really the holy grail of data structures is to achieve something that is big O of one like constant time whereby no matter how many numbers or names or sweaters are in the data structure it will take just one step or maybe three steps or even 100 steps but a number of steps that is completely independent of how many actual pieces of data are in the data structure. That is to say over time it doesn't get any slower even if you've got tens, hundreds, thousands, millions of elements in there already. So how do we gain something like big O of one constant time the appeal of which is reminiscent of our early picture from week one like this was our early algorithm for finding someone in a phone book or counting students in the room something linear literally straight lines. This was the logarithmic curve which especially as you zoom out starts to get very very appealing time-wise. Something that's constant time looks even prettier. It is a straight line at like the one step mark or the twostep marks whatever the constant number of step marks is. And even though logarithmic will still grow in perpetuity, constant time by definition never changes. And this is what we'd really like. So when you're searching for someone in your phone, you're searching for something on Google, you're asking a question of chatbt, you get an answer like that in constant time independent of how much data is actually in there. Well, let's see how we can do this. To do this, we're going to at least need a new building block, a term of art known as hashing. Hashing sort of formally takes an infinite domain of values and maps it to a finite range of values. So from high school math class, domain is the input, range is the output. So an infinite domain to a finite range is the goal here of hashing. And we might see this actually in the real world when you're playing, you know, games or whatnot or you're cleaning up after a game like here is here are some super jumbo playing cards that we got online. And suppose that you want to just get these into sorted order. Um you could do this very painstakingly. There's 52 cards here. You can kind of lay them all out and start sifting through them and put the two over here and the four over here and the hearts and the clubs and so forth. Or you can start to look at the cards and bucketize them first to take a 52- size problem and maybe uh shrink it down into four 13 byt problem. So here for instance is where uh the first diamond might go, the club here, spade over here, diamond over here. And I can kind of just do this again and again bucketizing literally all of these values so that I've got a very simple heristic that allows me to move the cards into these buckets each of which is going to have a subset of the values and then I've got smaller problems I can deal with. So dot dot dot assume that I bucketize all 52 of these values. Then I've just got four problems remaining. And I dare say it's a little easier then because they're all of the same suit and so I can pretty easily sort it from ace to king or whatnot because those are effectively just numbers at that point. So hashing refers to again taking values from an infinite range. In this case, it it can be finite and it is in this case. But if you were doing it more generally with numbers, you just have to map it to a finite range like 1 2 3 4 finite number of buckets of values at which point then you can solve the problem a little differently or a little more efficiently. So why is this gerine? Well, I would propose that if we want to start organizing our data in memory toward an idealistic goal of achieving constant time, hashing might be one ingredient for the solution there too. And generally, we're going to describe the process by which you decide what input goes to what output is namely what's called a hash function. It's a mathematical function or a function in code that takes as input a card from a deck or maybe a word from a dictionary and outputs a value that represents the bucket into which it should go. So in the case of our contacts app for instance, of course in the guey of it, you have all of your friends and family top to bottom uh alphabetically presumably you might want to ideally find someone quite quickly, ideally in constant time, right? The naive implementation that Apple or Google could implement is just use linear search. Search through all of your contacts top to bottom and eventually you will correctly find the person. But wouldn't it be nice if they instead use an array and then they can use binary search and get you the person in logarithmic time? That's great. But if you have a lot of friends and family in there or a much larger data set, wouldn't it be nice to just jump to the answer in one step instead of even log of nst step? So that's our goal. Can we get close to or actually at constant time? So with a hash function, we essentially have our old friend problem solving here, the inside of which the algorithm is known as a hash function. And for instance, if I'm looking at Mario's number, I might now want to look for Mario, not top to bottom or not divide and conquer, jumping around to the half, the middle of the middle of the middle. Let me just figure out what bucket Mario is in. And in the English alphabet, there's 26 letters of the alphabet, A through Z, either uppercase or lowerase. And suppose that I want to find what bucket Mario is in. Well, much like these cards and the suits thereof, wouldn't it make sense that anyone whose name start with with A goes into the first bucket and maybe the B's go into the second bucket and the dot dot dot Z's go into the last bucket. So, it stands to reason that if I pass in Mario to a hash function implemented in C or some other language, I would like to get back the number 12 because M is the 13th letter of the alphabet, but if we start counting at zero with our buckets, which are essentially an array, then it's index location 12 instead of 13. Similarly, if Luigi is the input, I'd like to get back the number 11. So, my hash function somehow takes as input in this story, a string, and gives me an integer. I claim there's theoretically an infinite number of names in the world in the English language. But there's only going to be 26 possible answers from this hash function 0 through 25. So, that's our infinite domain to our finite range. Instead of four, it's now 26. All right. So what should we do with the computer's memory to leverage the fact that we can very easily bucketize names based on the first letter of someone's name? Well, let me propose that the hash function part of this arcane as it looks is actually pretty straightforward. So if you wanted to translate this idea into C, you can include uh cype.h, which we've used a few times to get it access to like functions like two upper. And this is just to make sure you can be case insensitive. Here's my hash function. It's going to return an int, which is the goal. Takes a string as input. We'll call it name. And what does this function do? Well, it's kind of some clever asymmetric. It first converts to uppercase. The first letter of that person's name. So, if it's in all lowercase, forces it to uppercase. Why? Because I want to subtract no matter what 65 aka the asky value of capital A from this. And I don't want to screw up the math. If I'm doing like a lowercase letter minus a capital, I want capital minus capital is all. So this will return to me a number between 0 and 25 inclusive because if it is a letter a name that starts with a. I'm only looking at the first letter. I'm subtracting off a that gives me zero and I'm going to return zero as a result. Dot dot dot. If it's z, I'm going to return 25 instead. Now there's no error checking in here. If you type in uh non-English symbols, uh it's going to break. So let's just assume for simplicity this is indeed an English name that's coming in. I can refine this a little bit. I'm going to propose moving forward in our final week here of C, there are some added defenses you can put in place when writing code. Like if you know that you're receiving a name as input, that is you're passing something in by reference, there's a danger now per last week, because now the caller of this function, whoever's using this function is telling you where to find Mario and where to find Luigi's name. The problem with that is that you could go to that address and actually change their name in memory. Even if you're not supposed to, you're supposed to just use the name. So you can do something like const which says you should not be able to change this value even though I'm not giving you a copy of it by value. I'm giving you a reference there too. Another refinement here is that a hash function for an array as the goal should return a value that's zero or one or two on up. Never negative. So we can even more protectively say it's not just an int, it's an unsigned int. And we talked briefly about that last week, albeit in the context of chars. These are just like minor improvements that makes your code arguably better designed because you're opening yourself up to fewer possible mistakes or issues. All right, so with that said, let's now assume that we've got this kind of function in uh implemented and we can now use it to decide what bucket to put these people's names into. Well, let's give you what are called hashts, which are sort of the Swiss army knives of data structures. the kind of thing that some computer scientists have been quoted as saying if they were stuck on a desert island with only one data structure, this is probably the one they would want. Why? It's just really generally useful because it allows you quite powerfully to associate keys with values. Which is to say to come full circle today, hashts are often how you would implement at a lower level the thing we began class with talking about dictionaries, collections of key value pairs. That after all is what a phone book is. We call it, you know, names and numbers, but it's keys and values. That's what an actual English dictionary is. The Oxford English dictionary, it's a bunch of words and definitions or keys and values. So useful in general to be able to associate one piece of data with another. Argo hashts. So here's how you might implement in C a hash table. You want it to be of size 26 for instance. So 26 buckets from A to Z, hence the 26. You want this to be an array and that's fine. This is an array of four buckets. I'm going to use an array of 26 buckets because a hasht 2 is going to be an evolution of our linked list mashed together with an array. So a hasht in short is going to be an array with linked lists as we'll soon see. Here's the array. 26 pointers to nodes. So I'm going to give myself an array of pointers that is going to store ultimately a whole bunch of person objects like this. So for instance, here's a char star name, charst star number, as we've discussed in the past, representing a person. These are the pieces of data I might want to store in this data structure. However, let's simplify. Let's not worry about the phone number because we're not going to call anyone today. But for a linked list of persons, I'm going to need to store let's say the person's name, but also a pointer to the next such name, to the next such name, to the next such name. So again, I'm just deleting number as being unnecessary detail. But if we're going to have an array of link lists, this is our new definition of node for this part of class whereby it's not for a tree. It's now for a hash table. And we'll see this in action now. Here is my array of size 26. I drew it vertically, but who cares? These have always been artist renditions thereof. It just fits nicely on the screen this way. This is location zero. This is location 25. So any A names should end up over here. any uh Z name should end up down here and so forth. Let's just generalize this away as letters of the alphabet for clarity. That's where all the names are going to go. So hopefully Mario here, Luigi here, and everyone else. So what are each of these squares? They're just pointers to nodes. Initially, all null, all claim. But as soon as I insert Mario into this so-called hash table, I'm not going to put him literally here. I'm going to create a new node in memory, put Mario there, and then stitch it together. Because if I get another M name, I'm going to stitch it together and together and together again. So for instance, here comes Mario into this data structure. So this is a pointer to a person structure. Here's Luigi. And here's a third character as well, Peach. That's all working out great. Dot dot dot. There's a whole bunch of characters in the Nintendo universe. Here's a lot of them. Unfortunately, especially if you're a fan, there's also other names that do start with M and L and other letters of the alphabet. So, we're poised to have what we're going to call collisions, which is a downside of using a hash function. If you're going from something infinite to something finite, by definition, you're going to have a heck of a lot of potential collisions somehow. Multiple M names, multiple L names, and so forth. So, we've got to mitigate this somehow. Well, if you meet someone in the real world whose name happens to start with M, and you already are friends with Mario, well, you could delete Mario from your phone and put that new person there. But that's kind of dumb. You could clobber the value, that is. Or maybe you put the M friend here. And when that fills up, you put the M friend here. And then when you meet someone else whose name starts with M, you put it here. But then it just devolves into this mess. At which point now there's no rhyme or reason as to who is where. It devolves back into something linear. If you have to search the whole darn thing looking for M friends just because you ran out of space where you want it. So here's the beauty of mashing together an array with a linked list. You hash the name to the intended location like box 12 here. And then you just start stringing them together in a linked list. And hopefully you don't have too many of those collisions, but at least now you don't have to delete or make a mess of the data structure. So here's another bunch of names, three starting with L. Here's a bunch for the other letters of the alphabet. And it's just a linked it's an array now of linked lists. This then is a hash table. So the question to consider now is this better than an array? Is this better than a linked list? Well, I dare say it's better than a linked list because if it were a linked list from A to Z, what would be the running time of searching for anyone? Well, I'll spoil it. Big O of N. Because even if it's alphabetically sorted, you got to start at the beginning and go all the way through the list potentially to find someone like Zelda whose name starts with, of course, Z. But here we have an array of linked lists. So what's really the running time here? It's not quite as bad as n steps because if you assume a uniform distribution of names such that the world of Nintendo maybe has as many M names as L names as A names as B names, you could assume that there's a bunch of chains, a bunch of linked lists here chained together, but they're all roughly the same. So maybe you have n names in your phone book this way, but there these lists are only of size uh they're only 126 of that length because you've got that many names there. So what's the running time? Well, ideally we'd move away from link lists with big O of N and achieve our constant time. But uh we have these collisions to worry about here. Just to be clear, we want to get from big O of N to something constant time, but we're not going to get to constant time if we've got collisions. If we've got three L names and a few B names and a few A names, we can't just jump to that location and find the person we're looking for. So, what's the fundamental goal? Well, I think we want to maybe use a smarter hash function. And here depicted is an excerpt from a bigger hash table that is a much bigger array that assumes that you're not looking at the first letter of everyone's name, but apparently what instead the first three letters of the person's name, which just decreases the probability of collisions because in this model, I dare say there's no one else's name in the Nintendo universe that starts with L I N. So now Link has its own location in memory. And similarly for Luigi, LUI I believe is unique in the Nintendo universe. So we don't have a collision. Unfortunately, while this does seem to eliminate collisions based on this tiny example, what's the trade-off or what's the catch? Yeah, >> use a lot more memory. >> This is a lot more memory. I mean, kind of hinted at the fact that I didn't even fit most of it on the screen anymore. Here's L A. Here's L U. But what about all of the other letters of the alphabet and the other combinations of dot dot dot dot dot dot all possibilities. Moreover, some of these just don't make much sense. At least in English or in the Nintendo world, I don't think there's anyone whose name is going to start with a aaa or a aab or a a or a a d or a and so forth. You we're wasting a huge amount of space to reduce the probability of collision. So that's fine. We might get constant time now, but at what cost? Well, a heck of a lot more memory. And so this is one of the tensions when using a hash table is you want to come up with a good hash function that's maybe a little more sophisticated than the first letter but not so wasteful that you need a crazy number of buckets and therefore a huge amount more memory. So really even with collisions it's not quite as bad as n steps cuz technically if you have k buckets where k is like 26 buckets or four in this case technically if you do assume that the names are uniformly distributed over a through z the English alphabet. Well each of those link lists is going to be hopefully no bigger than n / k. So n / 26. But what do we know about higher order terms when doing big O notation? Big O of N / K. Yes, it's faster but asmmptoically that is theoretically you're still talking about big O of N. So here's the tension though like it's absolutely going to be faster. It will be like 26 times faster than a linked list but it's still just big O of N because it's going to take an amount of time that's still linear in the size of the data set. So we seem to have strayed yet again away from our constant time search. So can we find this holy grail? Well, we kind of can if you let me spend just like a lot more space. There are tries in the world, which could weirdly is short for retrieval, even though we don't say retrival, but a try is a tree made out of arrays, right? So, at some point, computer scientists were just like mashing things together Frankenstein style, like like length lists and arrays, and now we've got uh trees and and arrays. You two can mash something together and come up with your own. Let's look at what a try actually is because it is going to get us that constant time grail. So here is the root of a try. You can think of each node in a try as really being an array of values a through z in the case of an English problem like we've been playing with here. And what you do is you treat this array as being indexed from 0 through 25 or equivalently a through z. And you treat each of those elements as a pointer to another such node in the try. And what you do is implicitly store the names that you're storing in this data structure by going to an appropriate location based on the first letter in their name and then adding a pointer that represents the second letter in their name. Adding a pointer that represents the third letter of their name and so forth. So what do I mean by this? Suppose we want to insert Toad, one of the characters from the Nintendo universe first. If we count up where T is in the alphabet, this uh pointer here will be changed from null to a pointer to a new node that represents the second letter in Toad's name, which is going to be, of course, O. Then to insert to o A, we're going to need another node. A is going to lead me to D. And for p uh depiction sake, I'm going to draw in green, even though this would actually be a boolean or something like that in memory that indicates that Toad's name stops here. So in other words, this try in memory has four nodes. Now each of those nodes is essentially an array of size 26. But the word toad is not actually stored in the data structure explicitly. There's no charar toad, but implicitly because the tinter is non-null, the o pointer is non-null, the a pointer is non-null, and the dp pointer is in fact null at this point is the common technique here. This allows me to to insert other names from Nintendo's universe like Toadette because I can continue from here to go to the E node to the T- node uh to the T- node again and an E node which I'll again mark in green. So you can even have names that are substrings or equivalently superstrings of each other by just having all of these various breadcrumbs along the way where again a non-null pointer here to a non-null to a non-null to a null pointer here indicates that or it can't be null at this point. This is where we have to use a boolean indicates that there is a name in this data structure that ends here and there's another name that ends here. Meanwhile, if there's a third name from the universe like Tom, same idea, but eventually we can start reusing some of these arrays whereby non-null non-null null or there's a boolean flag here that says true, a name ends here. Now we're reusing that same array. So each of the nodes represents the e letter of the word or the name you're trying to store in the data structure. And by playing around with null and non-null and some booleans, you can implicitly store names in this structure. Now, it's way too uh pictorially difficult to depict lots and lots of names in this form. So, just imagine in your mind's eye that there's dozens, hundreds, thousands of names now in this data structure, but just more arrows and more arrays. How do you actually look someone up in this data structure? Well, if you want to ask a question like is Toad in this data structure or is toad in this data structure or anyone else, you can simply start at the root node as we would do for any tree and you hash on the first letter of toad's name which gives you this location and you check is it null? If not, T is implicitly there. So, you follow that pointer here and then you hash the second letter of Toad's name, an O, and check this pointer. And you follow that arrow. Then you check the third you hash on the third letter of Toad's name A and you follow that arrow. Then the fourth letter of Toad's name D and you see ah there's a boolean here represented in green that means Toad is in this data structure. And notice what's subtle here. It doesn't matter if there's three names in this try or three million names in this try. How many steps did it take me to confirm or deny that Toad is in this try? one, two, three, four, which is arguably constant. Even though the names can vary, at some point there's no Nintendo name longer than what, like 10 characters, 20 characters, maybe 30. I mean, there's some reasonable bound that is finite where there's never going to be a name longer than that because Nintendo's never going to come up with a crazy long name for a game. And so, you effectively have constant time for looking up to o a d, Toadette, Tom, Mario, Luigi, Peach, any of the other names we've looked at. So this is to say a try allows you to ask questions like is Toad in this data set or equivalently what is Toad's phone number in this data set because if you assume now that each of these pointers ultimately is not just a bull saying yes or no but maybe it's an actual person structure with a name and a number you can store even uh data like that your key value pairs where your names are your keys and your phone numbers are your values to make this more clear then here is a data structure how we might represent in See each of these nodes. It's not quite technically an just an array. It's an array of size 26. We'll call it children because it represents the children of that node of type struck node star. And then here for instance for simplicity is that person's number. If we reintroduce numbers and want to store in this data structure someone's phone number as well. So using that data structure and that kind of uh code you can implement a try using something as simple as this. Initially your try is just a pointer to a node. one such uh strct. We can of course initialize it to null to make clear that there's no names in here. But each time we allocate a node, we can then add another node, another node, hashing on the first, the second, the third, the fourth, dot dot dot, the last character in the person's name, allocating a node as needed, flipping that boolean to true or false, or adding their phone number as a char star to indicate that we have then found them. And so of all the data structures we've looked at today, big O of one is actually achieved with tries. And yet curiously for problem set five, you're not going to implement tries, you're going to implement hashts, that sort of Swiss Army knife of data structures that like every programmer everywhere knows about. Why? Like why not use tries very often in practice? Perhaps you certainly can, but what's the trade-off perhaps? Yeah, >> take up too much memory. >> It's a huge amount of memory. Things have escalated since the start of class. We add we started with one int. Then we added an int and a pointer and int and two pointers. Now I'm proposing 26 pointers plus a boolean or a data structure called person. I mean it's escalating significantly. And the biggest catch with a try as you might have imagined with toad and toad and Tom on the screen there's a huge amount of wasted memory just as we saw with a hash function potentially but that can be reigned in as you'll explore in the problem set with a try. most of the pointers in those arrays are just null and unused and it just tends to result in you're using way more memory to solve the problem correctly but in a way that tends to slow the computer down and just waste more memory than is useful. That said, just as we started today, there are stacks in the real world. There's cues in the real world. There are even hashts in the real world which you'll indeed implement in code for problem set five. Has anyone here ever had a salad from a restaurant called Sweet Green in Harvard Square? also elsewhere in the US like not one, two, like two of us, three of us. Okay, so not hard to imagine going to such a store, getting in a queue and staring at a shelf like this because what Sweet Green and similar restaurants do when you order for pickup is they hash your salad into a shelf like this. And so literally in Sweet Green might you see some wooden shelves like this. This is the A through E bucket, the F throughJ bucket, the K through N bucket and the uh O through Z bucket whereby if your name like Min happens to be in one of those ranges, they will hash my salad and put it here. But of course, even in the real world, there are some constraints. And what can go wrong with this here hasht system? Someone who's been there maybe what can go wrong? Imagine like the extreme lots of values here. Yeah. So there's no more space, right? So and this has happened to me in the past especially since green before adopting this system. And they used to put the A's here, the B's here, the C's here, the D's here and so forth. And then someone at some point realized that they were very frequently overflowing the A's to the B's and the B's to the C's. The no one was using Q or Z with any frequency. And so they were sort of wasting space and running out of space. So at some point they decided to like literally remove most of the letters of the alphabet, make the buckets bigger and fewer. So now it's very unlikely that you're going to have so many K's through N's that you overflow the shelf. But this is in the real world a data structure like we've seen today. And so therefore among the goals, even as arcane as things seem to be getting with all the pointer notation and dreferencing this and that, really all we're doing in code is implementing realworld solutions that other people have already come up with and translating them to a new domain. And the very last thing you'll do in C this week is indeed implement your very own spell checker whereby we'll give you a very large file of 100,000 plus English words. you'll have to come up with a clever and efficient way to load it up into memory. And we'll give you tools that will actually measure how fast or how slow your code is, how much memory or how little memory your code is so as to actually compare it against not just your own but perhaps others as well. So with that said, we'll end a bit early today. We'll see you next time. [applause] >> [music] >> Heat. Heat. [music] [music] >> [music] [music] [music] >> All right, this is CS50 and this is already week six wherein we transition away from C to a programming language called Python. And that's not to say that the past several weeks haven't been among the goals of the course. Indeed, in learning C, I very much think that you'll have at the end of this class so much more of a bottom-up understanding of how computers work, of how programming languages work. And in particular, you'll appreciate and understand better how Python and Java and C++ and Swift and so many other languages are actually doing their thing nowadays. But recall that we started with Scratch some weeks ago. When in Scratch, what was nice was that the first program we wrote, hello world, was just all too accessible. All you had to do was interlock two puzzle pieces in order to make the cat in that case say hello world. Well, thereafter, of course, we transitioned to C. And recall that in week one, we asked you to take on faith that you can sort of ignore that first line and a lot of these parentheses and the curly braces and really just focus on the essence of the program, which clearly is still about hello world and printing it, albeit using a different function and a bit new syntax. Today, very excitingly, all of that is truly going to go away and be distilled into a single line of code when you indeed want to have the computer say something like hello world. And this is what we mean by Python being a higher level language. So, humans over the decades learned uh from earlier designs, earlier programming languages, what worked well, what did not. Computers got faster, computers had more memory, and so you were able to start spending more of those resources in order to have the computer do more for you. And so, you don't need to be as pedantic syntactically anymore. you don't need to write as much code anymore and frankly you can just start solving problems of interest to you building products of interest to you so much more readily by choosing the right tool for the job and so in the real world if you continue coding after CS50 like sometimes C will be the right tool for the job sometime Python will be the right tool for the job and sometimes it's going to be a different language altogether that you'll never have studied in school and in fact what's compelling I think about this week six much like when I took the class back in the day is that after CS50 50, you'll have a taste of one, two, maybe a few different programming languages. And that's going to be enough to bootstrap yourself and teach yourself new languages because you're going to start to recognize in the real world similarities with past languages that you've seen, programming paradigms that are still sort of with us. And the syntax, yeah, that's invariably going to change, but that's the stuff that you are going to Google or ask chat GPT or some other AI about down the line. So long as you know enough of it to sort of get real work done, you'll focus mostly ultimately on the ideas and the problems you want to solve and less on the syntax. And so among the goals for this week and this week's problem set and really the rest of the course is to get you more comfortable feeling uncomfortable in front of your keyboard because we're not going to give you and tell you everything you need to know for a language like Python. You're going to turn to the documentation. You're going to turn to the duck and you're going to learn to teach yourself ultimately a new language. So let's actually write our first program and compare and contrast with how we might do that in C. So recall that in C we were in the habit for the first couple of weeks and doing make hello and make this build utility just kind of magically new to look for a file called hello.c C and magically to create a program called hello and then you could run it with dot/hello and then a week or so later we revealed that make is really just automating compilation of your program with the actual compiler clang in this case and passing it command line arguments like - o to get a specific output like the file name hello instead of the default which recall was a.out out passing in the name of the file you want to compile and turning on any libraries that you might want to compile into your program link into your program beyond the standard ones but then you could still run it in exactly the same way starting today when you write Python code and then want to run it you're simply going to run the Python program itself so just as clang is a C compiler uh Python is itself not only a programming language but a program as well and with the Python program which understands the Python programming language. Can you run code that you'll have written in a file called hello.py? And what this program is doing is a little bit different from what clang is doing, but we'll see that difference before long. But first, let me go over to VS Code and let's write our simplest our first of Python programs by doing code hello.py. And then in this file without any includes, any int main voids, I'm simply going to say print quote unquote hello, world close quote. All right. Now I'm not going to do make. I'm instead just going to do Python of hello.py. Cross my fingers as always and voila, my first program in Python. So it's sort of obvious that we got rid of the uh hash include. We got rid of the int main void. No curly braces. Only a couple of parentheses here. But what else is different to your eyes that's a little more subtle here versus C. Yeah. >> Yeah. So there's no F. So the print function is a little more human friendly. It's print instead of print f where the f did mean formatted, but we'll see that we still have that functionality. >> No need for the line break. >> So no need for the line break, specifically the back slashn. And yet here's my cursor on the next line. So I dare say humans over the years realized we are more commonly wanting a new line than we don't want it. And so they made the default actually give it to you automatically. And there's one more detail. Yeah. >> No semicolon. >> So there's no semicolon. So, I finished my thought at the end of the line, but I didn't need to explicitly terminate it with a semicolon. This is just with one program, all of these salient differences, but I'd argue that we got rid of all of the annoying stuff thus far anyway. So, we can really focus on what this program itself is doing. But what's exciting with Python 2 is just how quickly you can solve certain problems. And this isn't true of just Python. It's really any higher level language than C. In fact, just for fun, let me go ahead and implement Problem set five wherein you're challenged with implementing the fastest spell checker possible. So let me go back here to VS Code. Let's close out hello.py and clear my terminal window. And let me go ahead and do this. Let me first split my terminal by clicking this rectangular icon over here. And that's going to give me two terminal windows now left and right. Because in the first one at left, I'm going to CD into a directory I came with today, which is the staff's solution to problem set 5's spellch checker in C. And on the right hand side here, I'm going to CD into another directory I brought with me today called Python. Inside of which is a translation of problem set 5 into Python. In particular, I've implemented in advance a spell.py file, which is the analog in Python of spellar.c in C. And I've also prepared a dictionary. Py file. Unfortunately, if we open up dictionary.py, you'll see that it's not actually implemented yet. So in dictionary.py, let's implement in Python problem set five and see how long it takes. Well, the first thing I'm going to do is declare a global variable. We'll call it words. And set that equal to the return value of a Python function called set, which essentially gives me a set object, wherein I can store a whole bunch of words without duplicates. Python's going to manage all of that for me. In effect, it's going to implement what I needed to implement myself in problem set 5, a hash table. Now, down here, I'm going to go ahead and define a function called check. Pass in as input a parameter called word because, of course, that's how it was implemented in C. But notice a difference already. In Python, we use a new keyword called defaf to define a function. And we don't have to specify the type of the variable being passed in word in this case. And we also don't have to specify a return type for the function. Now, inside of this check function, it suffices to do this. I'm going to return word. In words, which is effectively a boolean expression asking, is the lowercase version of this word in the set? If so, return true. Otherwise, return false. done with the check function. Now let's go ahead and define another function called load which recall took an argument of the dictionary that you want to load into memory. And let's go ahead now and do this with open dictionary as file which effectively opens the dictionary as in C we used fop in Python we use open and it gives it a variable name of file. Then once that file is open, I'm going to go ahead and update that entire set of words which starts out empty by taking the file, reading the entire contents top to bottom, left to right, and splitting all of the lines therein on the new lines that terminate each of the strings, effectively updating the set with every word in that their dictionary. Then I'm going to assume that it all just worked because there's a lot less effort for me to uh to perform myself in Python. And I'm just going to go ahead and return true capital T in Python. Done. Next, let's go ahead and define that other function from problem set 5 size whose purpose in life was to tell me the size of the dictionary I had loaded. Well, in Python, that's pretty easy. I can just return the length or leen for short of the set in which I've stored all those words. Done. And then lastly, I'm going to go ahead and define an unload function, which recall was responsible for freeing any memory I myself had allocated. I don't seem to have done any of that in Python. In fact, that's managed for me now. So, I'm going to go ahead and simply say return true because there's no work to be done. And that's it. In like 19 lines of code in Python, most of which are blank lines, I claim I have reimplemented problem set 5 in Python. Well, let's take a look now at the difference. I'm going to go ahead and reopen my terminal window, and I'm going to go ahead and maximize it so we can see more output. And now I'm going to go ahead and run Python, which is going to be not only the name of the language, but the name of the program we use today to start running our Python code. And I'm going to run it on spellar.py, which I brought with me today, specifically on the largest of problem set 5's files homes.ext. Enter. And as with problem set 5 itself, we'll see a whole bunch of misspelled words being printed to the screen. Some of which might very well be misspelled. Some of which are just not in the dictionary. Some of which are simply possessives of words that are in the dictionary. But at the very end of this output, I should see not only how many words were found, but the total time involved, which appears to be 1.87 seconds. Not bad, seeing as it only took me like what, a minute or two to write the actual code. But there is going to be a trade-off. We'll see. Even though it took me much less human time and arguably was a lot easier to implement this imp spell checker in Python than I dare say it was for most everyone in C. Let's see what that trade-off might be. over in my lefthand terminal window in which I'm in the C directory which I brought with me as the staff solution in C to problem set 5. Let's go ahead and make that spellch checker. Then let's go ahead and do/speller and run it on the same file uh homes.ext and see how long the C implementation takes. Enter. And we see some of the same output might be slower sometimes just because of the cloud. there. Total time spent in the CPU, not necessarily printing everything to the screen, which might take longer, is only 1.32 seconds versus the 1.87 seconds in Python. Now, while only half a second, that's a decent percentage of the total amount of time spent running the spell checker in each of the windows. And so, that alone seems to be one of the trade-offs. Even though it seems to be much faster and there say easier to implement a problem in Python, there's going to be trade-offs in so far as the code might very well run slower. And as we'll see today, that's in large part because whereas C is of course compiled. That's why I ran make and in turn clang. And then the zeros and ones, the so-called machine code is what you're running. In Python, generally the pro the computer is interpreting your code essentially reading it top to bottom, left to right, much like a human in between two other humans might slowly translate one spoken language to the other if those two people don't in fact speak the same language themselves. So there's a bit of overhead when using Python, but I will say that the Python community has been working on this problem for some time. And so in general, it's not necessarily going to be as significant a trade-off because there are certain tricks we can do. And in fact, underneath the hood, what the Python language can do for you and the specific interpreter you're using is technically semi-secretely compile your code for you into something called bite code and then run that bite code, which is more efficient than actually reinterpreting it again and again. But we'll see more of this over time. For now, let's take a look at maybe two other problems that we might solve, dare say more easily, more quickly than we could have in C for problem set 4. Let me go ahead and shrink down my terminal window here. Close out dictionary.py. close one of my terminal windows and cd back to my main directory. And let's go ahead and open up that bridge bit mapap photograph that we used in problem set four and had to apply a number of Instagram-l like filters there too. Well, now let's go ahead and implement maybe one of those filters, the blur filter, whose purpose in life is just to blur this image. Well, let's see how long this takes. Let me go ahead and open up say uh blur.py, which is now going to be a Python program for blurring images. It's empty initially, but I can pretty much write this quite quickly. Now, let me go ahead and at the top of this file, write the Python keyword from PIL for Python image library. Import a object called image and another one called image filter. In particular, two features of the Python image library that's going to make this so much easier to actually solve. And then let's go ahead and define a variable. We'll call it before representing the before version of this image. And set that equal to image.open open quote unquote bridge.bmp where that of course is the name of the file we want to blur. Then let's go ahead and create a variable called after representing the after version of this same filter and set that equal to before filter open parenthesis image filter.box blur and then just to be a little dramatic I'm going to blur it more so than you needed to in problem set four but we'll see it more visibly now on the screen. Let's do an argument of 10. And then at the very end of this process, let's do after.save and save it in a file called say out.bmp. Done. So in just four lines of code, I claim I've implemented the blur function now in Python of what we did previously in C. Let me open my terminal window. Let me run the Python command this time on blur.py. Cross my fingers as always. And indeed, I've [snorts] made a mistake. Perhaps even if you've never written Python before, you can see it. And in fact, we'll see a number of these errors. Some intentional, some unintentional. But on line four, what I intended to do was set equal to uh before.filter that variable I created called after. All right, that's all right. Let's go back down to my terminal window, clear it to get rid of all that, and rerun python of blur.py. Cross my fingers even harder this time. Nothing bad seems to be happening indeed. Now, let's go ahead and open up out.bmp. And before we reveal that, let's go back to the original, which is bridge.bmp. BMP. And now dramatically, let's see the blurred version thereof. Voila. Hopefully to your eyes, too. It looks quite a bit blurry. Well, how about one more flourish? Those of you who were feeling more comfortable last week and implemented perhaps uh edges edge detection in C. Well, let's see if we can whip that up quite quickly, too. Let's go ahead and write a file called edges.py using that same bridge.bmp file. And in this file, let's go ahead and do the following. As before, from the Python image library, let's import uh the image feature and the image filter feature. Then, as before, let's create a variable called before. Set it equal to image.open, passing in bridge.bmp. So, so far the same as before. Now, let's create a variable called after. Set it equal to before. Passing in this time image filter.find edges, which is different from box blur. And by definition, it's going to find the image the edges of this image. And then after, as before, let's do after.save of out.bmp and just clobber the version of the blurred file that we just created. All right, that's it. Let's go ahead and open up my terminal window now. Let's go ahead and again run Python, but this time on edges.py. Cross my fingers real hard. So far so good. And that was quite fast. Recall that the bridge.bmp image looked like this. But now when we open up this new and improved version of out.bmp, BMP. Thanks to Python in just four lines of code, we now have all of our edges detected. So, what can we then learn from C itself? Well, C had, of course, functions. And functions were those actions or verbs that simply got work done. And let's go ahead and compare side by side, much like we did with Scratch and C, the ideas that today onward, are still going to be the same. And uh how they translate to Python. So, on the left here, we'll now have our friend Scratch. This, of course, was one of the first puzzle pieces we saw. It's a purple puzzle piece saying say and it was a function in so far as it said the value of its argument which in this case is hello world. Well, we've already seen in Python what this looks like. It looks similar to the version in C, but it's no longer print f. There's no longer a semicolon and there's no longer an explicit new line. So in Python, it's quite simply this. Meanwhile, in Python, there are a whole bunch of libraries as well. Now in C we had simply header files and those header files give you access to the prototypes of that is the signatures of the functions that you want to use from those libraries. Python uses somewhat different vernacular whereby Python has what are called modules and packages and a package is just a collection of modules. But a a module is just a library using Python speak so to speak. So, anytime you hear someone discussing a module or a package in Python, they're just talking about using a library. And that library might come with the language itself just built in as standard or it might be a third-party library that you might download and install yourself much like I did a few weeks back when we installed uh the cowsay program so that I could actually have a cow or other animals on the screen display text. So, in C recall, we had something like this include CS50.h, which was the header file pre-installed for you somewhere. But we will have for at least this week a analog of the CS50 library in C also in Python just to make this transition from C to Python a bit easier. These two though are meant to be training wheels that you can take off and should take off, you know, even within a week or so. It's just meant to smooth that transition and make clear what's the same and what's different. So in the CS50 library for Python, we also have a function called get string whose purpose in life is to get a string. To access it though, you don't use hashincclude cs50.h. That's a C thing. In Python, you would say from CS50 import get string. It's a little more verbose, but it's also a little more precise as to what you want from the library, especially if you don't want the whole thing loaded into memory. So here, for instance, is now a Scratch program that was a little more interesting than just printing out hello world. This was the first program we wrote that actually got some user input. So in fact, let me go back to VS Code and let's see if we can't resurrect this C program real quickly in the form of a new hello.c. So I'm going to run code of hello.c and then in my ter in my uh code tab I'm going to do include cs50.h include standard io.h and then below that I'm going to go ahead and whip up our familiar version of this int main void and then inside the curly braces we'll bring back string even though we now know it's char star. We'll call our variable answer. Set it equal to get string. Ask the user quote unquote what's your name with a space just to move the cursor over. still need my semicolon and C. And then after that, recall back in week one, we did hello, percent s back slashn and then plugged in the variable answer so as to see hello David, hello Kelly or something else. Just to be safe, let me do make hello. All is well so far dot /hello type my name. And this version in C seems to be working. Okay, so in C, these lines of code here translate pretty literally to what we just saw. Although we got the answer variable in Scratch for free. That blue puzzle piece just existed without R having to create it. But it's a decent number of hoops to jump through in order to just get user input and print it out. Well, in Python, this is going to get a little more succinct in that the Python version of this code is now going to look like this. Print f is now print. The semicolons are gone. And what else seems a little bit different? Yeah. >> I don't need any placeholders. Yeah. So, we don't need the percent s anymore. In fact, I'm curiously using a plus, which if some of you studied Java or some other language, you might have actually seen this before. Even if you've never seen Python before, you've only seen C in CS50, you can probably guess what the plus is doing. Even if you don't know the the technical vocab, what is the plus probably doing here? Yeah. So, it's concatenating or joining together the thing on the left with the thing on the right. And we actually had that vernacular in the world of Scratch. We had the join puzzle piece that joins hello, space and the value inside of answer. A plus in Python can do exactly the same thing. So it's a little more user friendly than having to anticipate, oh, let's put the placeholder here and then come back later and plug in the variable. Humans over time just realize that it's a lot easier to sort of do this in this way than bother with placeholders. Though you can still use placeholders for other purposes. Another subtle difference between the C and Python version of these two lines. More subtle than that. What's missing? Yeah, I'm back. >> Uh, so the back slashn is again gone for Python. So that sort of happens for free indeed. And one more difference. >> You don't need to declare the type of answer. >> Yeah, we don't need to declare the type of answer. Recall that if we rewind in the C version, you needed to tell the compiler that this is a string. And last week, we could have changed string to char star, but we still had to tell the compiler what data type we're putting into that variable. In Python, we can now get rid of that data type. And Python will just figure it out from context. If get string returns a string, well then obviously the variable should store a string. If a function returns an int, well then obviously the variable should store an int. And the language is just doing more of that decision-making for you just to save you time and save you thought. There's a subtlety here though where we can make this program a little bit different. In fact, let's whip it up first in Python. Let me go back to VS Code here. Clear my terminal and let's go ahead and create a program again called hello.py. That'll open up my previous version thereof. And just so we can see these things side by side, I'm going to drag that tab over to the right of VS Code and let go. And now you can see the C version still on the left and the Python version at the right. What I'm going to do here now in my Python version is change it to be quite like the version in C now at left. So as promised I'm going to do from CS50 import get string. Then below that I'm going to say simply answer equals get string quote unquote what's your name question mark space no semicolon. But then on the next line what I'm whoops but uh parenthesis. Then on the next line, I'm going to do print quote unquote hello, space close quote plus answer. Down here, I'm going to go ahead and run Python if hello.py again. No compilation step. I'm just going to interpret it line by line. What's my name? David. And it seems now to work exactly the same. Now, it turns out in Python there's even more ways to solve problems like this, even trivial problems like this. So here we're using the plus sign, not as addition per se, but as the concatenation operator, the join operation. If you want though you can take advantage of the fact that print in Python can take more than one argument. It can take two or three or four or even zero by simply changing the plus to a comma getting rid of that seemingly superfluous space and just give print two things to print because it turns out per the documentation of print which we'll eventually see it knows that if it takes one two arguments by default separate them for you by a single space and that's something we can override as well. which one is better like h like I don't know like they're sort of equivalent. It's such a trivial difference but it speaks to the flexibility that you'll start to have whereby the language is a little less rigid than C was certainly when it comes to printing strings. So in fact if I go back to VS Code here and I go ahead and change that plus to a comma and get rid of the space inside of the quotes. I can rerun Python of hello.py, type in my name and we see exactly the same result there. But we can take this one step further. Even though it's going to look a little cryptic, this is sort of the more Pythonic way to do things. And that too is actually a term of art to do something Pythonically is to do it the way that most Python programmers would do it. It's not the only way. It's not necessarily the right way, but it's sort of the recommended way in the community. So here we have that latest version where I'm passing two arguments to print. The first is quote unquote hello, and then the second of which is the value of answer. I could similarly write this same program with this crazy syntax. Takes a little getting used to, but it turns out it's actually kind of nice overall. What's obviously different? Well, one, there's these weird curly braces are back. They're not part of the logic of the program. They're literally inside of the double quotes. But you can probably guess how this what this does for me because there's one other crucial difference. What else has changed between before and after? Yeah, there's this weird f which is not part of print f. It's actually inside of the parenthesis and next to the double quotes. And even this one when this came out was a little weird looking to people. But this is how you get this thing to be a formatted string, aka an F string, as opposed to it being just a literal string of text. Now, you can probably guess what it means to put the variable's name inside of the curly braces. It means the value of that variable is going to be substituted right there. Similar in spirit to the percent s in C, but a little more explicit. With the percent S, you had to remember that that percent S corresponds to this variable's value or something like that, which was just annoying if anything else uh if anything. But this time you have a placeholder in curly braces that just says what you want there, that particular value. And what this means more technically is that the answer variable will be interpolated by the interpreter which means its value will be plugged in right there. So let's try this. Let me go back over to VS Code and quite simply on my last line of code here, let's change the input to print to be quote unquote hello, and then curly brace answer then close curly brace close quote. And I've done this. This is intentional, but let's see. Let me go ahead and rerun python if hello.py davv ID. What are we about to see? Hello, answer. So this is a bug, but just to demonstrate like what is going on and what's therefore missing. What what did I forget? Yeah. >> Yeah, I didn't declare that this is a so-called fring or format string. The fix for this, weirdly, is just to put an F right there. And now if I rerun Python of hello.py, Pi. Type in my name again. Cross my fingers. Now I see that the variable has indeed been interpolated and its value plugged in where I wanted it. All right. Turns out we can take off one of these training wheels already. I I propose that get string just exists in the library just to smooth the transition, but honestly it's not really doing anything all that interesting. So let's take this first training wheel off. It turns out that Python comes with a function appropriately named input such that if you want to get input from the human via their keyboard, you can just use the input function. So we can already for this program get rid of the CS50 library because input essentially behaves just like the get string function. So if I go back to my Python version here, I can change get uh get string to input. And I can even go and delete this training wheel up there. Rerun Python of hello.pay in my terminal. DAV ID enter and we're still in business as well. So input is generally going to be the way you go about getting input now from the user. All right, let me pause here and see if there's any questions as we try to bridge these two worlds from C to Python. Yeah, >> so in Python, we don't need the main function. And why is that? >> Good question. In Python, why don't we need the main function anymore? because clearly that's been omnipresent in like every program we've written thus far. And here we have it in all of our Python programs thus far absent. It turns out that humans realize it's just so common that you want the file you're editing to be the main part of your program. Like why bother adding the additional syntax of saying int main void or something analogous? It's just easier if you want to write two lines of code to get some work done. Why do you have to waste my time adding all of these this boilerplate code which we've been doing up until now. Now that said, we're going to bring back main in a little bit because it will solve a problem. But generally speaking, what I'm doing here is indeed a program, but people in the real world would also call these scripts where a script is like a lightweight program that pretty much just reads top to bottom, left to right. It might be fairly lightweight. It's really synonymous with writing a program, but this is again one of the appeals of a language like Python. You can just get right in and get out and get the job done. Even Java has moved to this in recent years where you don't have to put everything in a class. Uh public static void main for those familiar. You can just write uh system.out.print line and get some work done. >> Yeah. >> Is input only for string? >> Good question. Is input only for a string? Yes. Right now it will get input from the user via their keyboard and you'll get back a string just like get string. And we'll come back to why that's maybe not a a good thing. All right. So what's more might we want to do at this point? Well, let's tease apart some differences now with C. So up until now, every argument we've ever passed into a function in C and Scratch for that matter is a so-called positional parameter. And a parameter is the same thing as an argument, but generally when you're looking at the function from the functions perspective, it's a parameter that it accepts. But when you're calling the function and passing in an input, you call it typically an argument, but they refer to essentially the same thing. And all of the parameters we've been passing into functions thus far have been positional in the sense that the order matters. the first thing, then the second thing, then the third thing, and so forth. For instance, with print f, the first thing has to be the quoted string, maybe with a placeholder, and then if there's another argument after the comma, that can be the second argument, the third argument, and so forth. But it turns out Python additionally supports what are called named parameters, whereby you don't have to rely only on the order in which you're enumerating the arguments to a function. And that's helpful because some functions, especially in the real world, when you start using other people's libraries that have lots of functionality, they might not take just one or two arguments. They might take four arguments, 10 arguments, maybe even more. And it can just be unwieldy to have to remember the precise order of all those arguments. You're just asking for trouble if you're going to screw up or a colleague is going to get the order out of uh out of whack. So with name parameters, you can actually be explicit with Python and tell it what argument you are trying to pass in by giving it an actual name. So let me go over to VS Code here and propose that we use this for really the simplest of programs in order to override that default new line that we seem to be getting for free just by calling print. In other words, let me go ahead here and clear my terminal window. Let me close. C and focus only on hello.py for just a moment. And let's make it much simpler like the very first version and just print out using Python's print function, not print f quote unquote hello world close quote. And now here I'm going to do Python of hello.py. Enter. And we still see that the cursor moves to the next line. The dollar sign moves to the next line because I'm automatically getting a new line. Well, what if you don't want that? How can you override that behavior? Well, you can actually use a named parameter in Python. And I can go up here and add a second argument that if it were just something like uh this, that would literally print out the word this because it's just another string. But if I give it a name like end equals quote unquote, I can override the default behavior of the Python print function by changing the value of its end parameter to be the so-called empty string, quote unquote, which means literally there's nothing there. Watch what happens now. If I run Python of hello.py and hit enter, the dollar sign is weirdly and sort of in the ugly way on the same line, just like it was when I made the mistake in C in week one of omitting the backslash. That is to say, what the default value of this end parameter really is is quote unquote back slashn. And I can make it explicit by changing my code as such. I'm going to go ahead and rerun python of hello.py. And now the cursor is back on the next line. And not that this is that useful other than overriding that default, but you could do fun things like exclamation point, exclamation point, exclamation point if you really want print to be excited to print some things for you. And if I now run Python of hello.pay a third time, now you see that it's ending with exclamation point, exclamation point, exclamation point. Looks a little stupid with the dollar sign. So you could even toss in a new line there. Run it yet again. And now we sort of get both of those there. But I would say the common case is to use that end uh named parameter simply to override it. So how do you learn more about these kinds of things? Well, if you go to the official documentation for Python, which is a thing more so than with C, like if you want to learn more about Python and the functions it offers and the arguments it takes, you go to the official documentation uh docs.python.org. This is essentially analogous to the so-called manual pages or man pages that CS50 has a version of, but there is no one de facto source for those man pages. Several different versions of them exist in the while. Whereas Python itself as a community maintains its own official documentation. So for instance, if you go to a specific URL like this ending in functions.html, you'll see an exhaustive list of all of the functions that come with Python besides just the print function. And we'll see a bunch of more today. If specifically you scroll down to the print uh documentation, you'll see something that's a little arcane that looks like this. But this is representative of a Python prototype, if you will, often also called a signature that just tells you the name of a function and then how many and what type of arguments it takes. So how to read this? Well, the print function takes some number of objects. So in Python specifically this syntax of star objects just means zero or more objects whatever that is like a number or a string or something else the stuff you want to print out. After that if you start using named parameters you can specify what the default separator is the separator between arguments to print. So, recall that when I did quote unquote hello, comma, quote unquote, uh, or quote unquote hello, comma, answer, that was separated automatically for us by a single space, even without my hitting the space bar inside of my quotes. That's because the default value here is in fact a single space. The default value for end, as promised, is indeed back slashn. And then there's some other stuff related to file IO that print can also deal with, but more on that perhaps another time. There's one curiosity here. In Python, it turns out that you can use double quotes or single quotes around strings, where in C, it was much more regimented. Double quotes are for strings and single quotes are for chars, characters only, single characters. It doesn't matter in Python which one you use so long as you're consistent. And stylistically, you should really pick one and go with it. And the only time you should really alternate between the two is maybe if you want to put like an apostrophe for some human's name inside of double quote inside of single quotes or something like that. But generally you have a little more flexibility in Python. And you'll see in different languages Python community tends to use single quotes at least in the documentation. The JavaScript world tends to use single quotes. Um we in CS50 often use double quotes just for consistency with what we do in C. But any uh community or company would typically have its own style guide that dictates which one you should use if only for consistency questions then on this here print function as just representative of all of the docs that you'll see. All right. Well, let's take a quick look at variables. We've used these a few times already, but let's focus in a little more detail on what's actually different in Scratch. If you wanted to create a variable called counter and set it equal to zero, you would use this orange puzzle piece here. In C, you would do something like this. The type of the variable, the name of the variable, and then set it equal to the initial value semicolon. In Python, it's going to be a little similar, but you can probably guess where we're going with this. How is this line of code probably about to change? Yeah, >> good. We're not going to bother with int or the data type more generally. We're just going to say counter cuz obviously like a smart interpreter can just figure it out from context that you're putting a zero in there. It's obviously an integer. And what else is about to go away? The semicolon. So this is the C version. And voila, this now is the Python version. And this is as silly as this example is, it's kind of representative of how languages like Python just tend to be a little more programmer friendly because you just type less and get the same work done. All right. So if we wanted to do something now in Scratch like increment the counter by one, you would use this puzzle piece here. In C, we could do something like this. In Python, it's going to be almost exactly the same except of course no semicolon. In C, we could alternatively do this. And you can also do this in Python. Uh in C though, you could also do what other technique >> plus+ I'm sorry, but Python has taken that away from us. So if you got into the habit of using plus+ or minus minus, that's great. Use them in C all you want. In Python, they just don't exist. So you'll see this more commonly instead as the heruristic. All right. What about the various types that exist in Python? Because even though you don't have to specify the types when declaring your variables, they do in fact actually exist underneath the hood. And it's worth knowing a little something about them because not knowing will lead often to some form of bug. So in C, we had types like this bull, char, double, float, int, long, and string. The last of which was thanks to the CS50 library. that last week we would have started calling uh a string charst star instead which it still is a data type the address of some char. In Python we're going to whittle this list down to a subset of those essentially whereby we still have bulls we still have floats we still have ins and we do have strings but they're literally called stirs str. So it's not a CS50 thing. The Python community call strings str. But absent from this list is any mention of star not to mention charst star. There are no pointers in Python. And indeed, as powerful as I'd hope you found uh weeks four and five to be, I dare say you also found them incredibly frustrating and challenging and want to yield bugs in your code because with that power of memory management comes a whole slew of potential mistakes that you can make. And that's true not just for CS50 students, but for programmers, adult programmers, full-time programmers around the world. And so among the other features of languages like Python is they try to take away certain features of languages like C that were just too dangerous in the first place might be wonderfully powerful might help you solve problems more quickly more precisely but if they tend to do more damage than they're worth sometimes it's worth just abstracting those details away. Similarly Java has references as some of you might know but does not have pointers per se. You can't go poking around arbitrary locations in memory in the same way that you can with C. So, let's take some of these data types out for a spin and see what's the same and what's different. Let me go back to VS Code here and let me propose that we bring back one of our old calculators from a while back. So, let me clear my terminal, close hello.py, and let me go ahead and open up a version of this program that I brought in advance, which was our calculator version 0 from back then. So, just to remind you, one of the first versions of our calculator had the CS50 library as well as the standard IO library. And then we simply got an int using get int in week one. We got another int in week one using get int. And then we simply perform some addition. So it was a very trivial calculator that we did very early on just to demonstrate some of the operators and syntax of C. Well, let's go ahead and try converting this to Python by creating our own program calculator.py. So in my terminal window, I'm going to write code of uh calculator.py. It's going to open another tab which I'm just going to drag over to the right just so we can see both side by side. I won't bother with uh say well let's do it for par here. Let me copy the C code into the Python file even though this will not work in the same way but let's keep what we need and get rid of what we don't. So instead of the slash for comments in Python turns out the convention is to use a single hash symbol like this. So it's a minor difference. It's uh half as many keystrokes. So that's nice, but we're not going to include anything like this. But we are going to do from CS50, let's import a function that I promised would exist called get int. But we'll soon get rid of that training wheel as well. We don't need main or this curly brace. We don't need this curly brace. And we don't need all of this indentation as a result. So I'm going to move all of that over to the left. I'm going to fix all of the comments to be Python comments by changing the slash to hash symbols. And now I'm going to change each of these three lines of code, as you might expect, to the Python version. So you probably can guess already, we can get rid of the int there and the int there. We can get rid of the semicolon here and the semicolon here. We can get rid of the f in print f here. And we can get rid of the semicolon here. And there's a few different ways we could do this, but I dare say the simplest is going to be to get rid of the format code altogether and that first argument and just tell Python to print x + y. So, there's a few different ways we can do this, but that's probably the most literal translation of the program at left to the program at right. Let's reopen the terminal window and run Python of calculator.py and hit enter. Let's do something like x is 1, y is two, and hopefully we do in fact get three. All right, so that's all fine and good, but let's take off one of our training wheels now. So, let me get rid of our C version here and focus just for the moment on Python. Let's take away this C code. And what was the function we can use to get user input? Yeah, it was called a little louder. It's just called input. So, let's get rid of CS50's get int already and use input instead. All right. So, this program is much simpler already. So, let's go ahead and reopen the terminal window. Run Python of calculator.py. Do one again for x, two again for y, and of course 1 + 2 equals 12. So what's going on here? Because clearly this is a step backwards. Yeah. >> Yeah. So in the context of strings, plus represents concatenation, the joining of two arguments on the left and the right here that seems to be what's happening because it's not 12 per se. It's more literally one two concatenated together. But why is that? Well, apparently the input function indeed returns a string. That is the key. Those are the keystrokes that came back from the user. might look like numbers and Arabic numerals to us one and two but it's being treated as a string more technically like underneath the hood there is some char star stuff going on there even though we're not using that same terminology so intuitively what's going to be the solution without just reverting to using the training wheel that is the get int function from CS50 put another way how did CS50 probably implement get int might you think >> Yeah. So recall that in C we could cast some data types to other data types. Typically ints to chars or chars to ints. It's not quite as simple as casting in this case because underneath the hood thanks to our knowledge of C. There's a bunch of stuff going on. There's probably a one and there's a null character. There's a two and there's a null character. So it's not quite as literal as a char to an int or an int to a char. So, we're going to more properly convert the string or the stir to an int. We're not casting, but converting. And converting just implies that there's a little more work that has to be done. But thankfully, Python can do this for us. In fact, let me go up to line four here and say, uh, pass the well, actually, let's do it in this a couple ways. Let's first convert the x value to an integer. Let's convert the y value to an integer as well. So, funny enough, it's very similar syntactically to casting, but in C, when you cast something, you actually wrote the data type in parenthesis. Now, the data type itself is a function that takes an argument, which is the stir or string that you want to convert. So, let me go back to my terminal, do Python of calculator.py, enter, type in one, type in two, and now I get back my three answer. Now, as you might imagine, just like in C, we can kind of play around with where we're performing some of these operations. And this looks, you know, arguably a little less obvious now as to what is being added. So I really like the simplicity of x plus y just does what it says. So I could convert these in other ways. I could say after line four, you know what, re change x to be the int version of x. But generally speaking, that's kind of wasting a line of code by just doing something you could do on a single line. So let me delete that and instead just say that well if I know the return value of the input function is a stir let's just pass that output as the input to the int function and it'd be a little more Pythonic so to speak to just pass the input functions output as the input to int which is really hard to say but we've done this in C just nesting function calls like this. All right so if I run this one more time Python of calculator.py pi. Type in one. Type in two. We're back now in business. Now, what I won't trip over just yet is a subtlety that whereby I'm deliberately typing in actual numbers like one and two, but if you are following along at home or on your laptop, if you were to type in cat and dog, like bad things will happen. But we'll come back to that before long. All right. Questions though on any of this conversion of our strings to our integers in this case? Oh, all right. Well, what more does Python offer to us? Well, in addition to these data types, there's actually going to be a bunch of others. A few of which we'll actually use today. In fact, we'll see ranges of numbers. That's like that's a thing built into Python. We'll see lists of numbers, which is going to be like a new and improved version of an array that solves like all of last week's problems when we talked about the downsides of using arrays. There's going to be tpples for things like x, y coordinates or GPS coordinates or anything where you have collections of values. There's going to be dicks or dictionaries whereby you can have key value pairs provided to you without having to write a whole hash table yourself. And you can have sets which you can use to just contain unique sets of values that you just want to check for membership. And there's bunches of other data types as well. And this is where languages like Python start to get really powerful because all of the data structures we talked about in C, we really only got from the language itself an array. everything else we had to build or at least talk about building in class. These now and more come with the language. Meanwhile, in the CS50 library for Python, just so you know, there are a whole bunch of functions. These though were the C versions. In Python, it stands to reason that we don't need as many because there's fewer data types in Python, but get float, get int, and get string do all exist in the CS50 library for Python. you're welcome and encouraged to use it because indeed among the goals for problem set six are going to be to redo some of your C problem set problems in Python where you can look at your own C code and hopefully um uh you like that solution and figure out how to convert it line by line essentially to the corresponding Python version but clearly we've seen ways of taking these training wheels off quite quickly as well and in fact if you wanted to import all three of those functions for a larger program you could do this just following the uh approach that I took so already, but you can also just separated them by commas like this. Or it turns out you can also import the whole CS50 library as you'll see in some code and then just access the functions within with slightly different syntax as well. All right, how about another construct from scratch and from C now in fact in Python. So in uh Scratch if we wanted to do a comparison like is X less than Y where each of those are variables then say as much here in C it looked like this and nicely enough you can probably guess already which what's going to change here like the f is about to go away the back slashn is about to go away the semicolon is about to go away but some other stuff's about to go away as well focus your attention on the syntax like parenthesis and curly braces because in Python it's just that so we got rid of the parenthesis because they didn't really add all that much logic ically we got rid of the curly braces which technically we could do in C anytime there's a single line of code inside of a conditional but for uh consistency stylistically we always use them as well. Python though does not have you use any of those curly braces at all. But Python requires that you indent your code properly. So, if you've ever been among those who are writing out your program and like everything is just crazily like left aligned and just a big mess until style 50 swoops in and cleans it up for you, you're not going to be able to write Python code like that anymore. That's been such a societal problem among programmers, newbies and professionals alike, that the language itself requires logically that if you want this line of code to execute if this boolean expression is true, you've got to indent this line by convention four spaces. You can't be lazy and leave it all left aligned and sort of fix it up later. This has made Python code arguably more readable because of these language-based requirements. Meanwhile, let's look at a if else construct in Scratch which looked a little something like this. In C, it looked like this, which is kind of a lot of lines just to express the simple idea. All of those same things are going to go away. Whereby in Python, it looks like this instead. And the only other difference worth calling out is that because you don't have the curly braces, you do have a colon which precedes the subsequent indentation as well. Meanwhile, if we've got an if else if else in Scratch in C, of course, it looked like this. A lot of this is going to go away in the flash of a screen, but there's going to be a curiosity, which is not in fact a typo. Notice what happens with the elseif. It's abbreviated L if. And honestly, to this day, all these years later, I can never remember if it's l if or else if because different languages use different shorthand spellings of this phrase. It's L if in Python. Uh because that's maybe the most succinct you can make the two words themselves. But everything else is effectively the same, including the additional colon this time. Okay, questions on any of those conditionals and syntax. Yeah. >> So, what language did they code Python? >> What a good question. What language did they code Python in? The interpreter we are using within VS code is itself written in C aka C Python. However, you can implement a Python interpreter really in any language including machine code like raw zeros and ones if you have that much free time in assembly language which we saw briefly weeks ago. You could write an interpreter for Python in Python if you really want to be meta about it or in C++ or in Java. This is the thing about programming languages. You can use any language to create a compiler for or interpreter for another language. What's going to vary is just how easy or difficult it is and how much time it therefore takes you. Good question. Other questions on any of these here features? Oh. All right. Well, let's do something a little bit uh different in Python visa VC by opening up maybe a comparison program that we looked at some time ago. So, let me go back to VS Code here. I'm going to close my calculator and I'm going to open up now from my uh distribution code today a version of our comparison program from a while back which was essentially the uh version three zero index thereof. So this one has comments which the very first one in week one did not. But notice as a refresher what this comparison program was doing. It was including cs50.h and standard.io.h. It was prompting the user for two integers via get int x and y. It was then doing a very simple comparison comparing X against Y to determine if it's less than, greater than, or dot dot dot the same as X and uh the same or equal to the same. So just so that we can go through the motions of converting one of these to the other, let's do that side by side. Let me code a program called compare.py. Let me close my terminal. Drag the Python version over to the right here. And without comments this time, let's just do from CS50 import get int. Then below that, let's do x equals get int and ask the user for what's uh x question mark. Then let's ask the user for y using get intquote what's y question mark. Then below that, let's do if x less than y colon. Go ahead and print quote unquote X is less than Y. Close quote. L if X greater than Y. Go ahead and print quote unquote X is greater than Y. Else colon, let's go ahead and print out quote unquote X is equal to Y. So I dare say these are now equivalent. It's clearly fewer lines because a lot of the lines it left were admittedly comments, but also some curly braces. And there's more syntax like parenthesis that we got rid of, too. Let me open my terminal window. Let me run Python of compare.py. We'll type in one and two. One is less than uh x is less than y. Let's do it again using two and one. x is greater than y. Let's do it one last time. One and one. And of course, those two now are equal to each other. All right. But why go down this road again? Because that was kind of a simple exercise. But recall that we introduced this comparison of ants because it was so sort of stupidly simple. even if the syntax at that week was completely new. But we ran into an issue pretty fast when we started comparing strings. And that was a problem we really only fixed in week four when we finally revealed what a string actually is. If we focus a bit more on Python strings, it turns out that we can solve that problem much more easily in the world of Python. In fact, let me go back to VS Code here. Let me close these two versions of int comparison. Let me open up at left a version of my program that I brought with me here that contains a version from week 2 wherein we finally revealed that a string is just a char star. But recall that the solution in week four as well as in week one when we first encountered this problem was to use stir comp a function that whose purpose in life is to compare two strings character by character by character using a for loop or something like that. But they have knowledge therefore of how to navigate pointers, how to look for the null character, the back/zero at the end. And all of that came from our friend string.h. Well, how can we go about implementing the same idea in Python? Well, let's open up VS Codes terminal window, open up a new program called compare.py, but this time let's get rid of the integer version thereof. Let's get two ins from the user. And I won't even use any CS50 training wheels. Let's just use the input function to get S and ask the user for a value of S. So S colon close quote with a space T equals input ask the user for a variable T. And then let's just ask the question. If S equals T, then print out quote unquote same. Else go ahead and print out quote unquote different. Let me move these side by side just so you can see the difference. Notice how much code we have to write and how much we needed to understand in order to compare something as trivial as two strings in C. But in Python, we're literally just using equals equals. And let's see if it actually works. So, Python of compare.py. Enter. Let's type in maybe cat for s and dog for t. And those are in fact different, but we would have gotten the same answer in C. Let's rerun Python of compare.py and type in cat. Type in cat again. And now it's detecting them the same. So wonderfully, Python has solved that seemingly annoying problem of not taking us literally like don't compare the pointer against the pointer. Compare what a reasonable programmer probably really cares about the values of those strings. So the equal equals is doing all of the for loop or the while loop iterating over those things character by character and actually giving us the answer we want. So what else gets easier in Python? Well, let's focus a bit more on these strings. Let me go back into VS Code here. Let me close out our two comparison programs and clear my terminal. And let me go ahead and open up a prior program that we wrote that one called agree.c. And namely in the staff version of the code online, this was agree to. C, which is where we left it. Now recall in this C program that we did the following. We first using CS50's get char function prompted the user for a char hopefully Y or N for yes or no respectively. And then we used a boolean expression and actually the combination of two using the two vertical bars to ask whether the inputed character is capital Y or the inputed character is lowercase Y. And if so, we went ahead and printed out that the user agreed. Otherwise, if they type in anything else for that character, we simply printed out not agreed. Well, how can we go about implementing that same program in Python? For instance, in a file called agree.py. Well, let me go ahead and open up my terminal window again. Let's create a file called agree.py. not pi as before. Let me go ahead and drag it over to the right so we can see these two things side by side. And let me go ahead and do this. I'm going to set a variable say called s uh equal to the return value of input quote unquote do you agree thereby asking the user the same question as before. No need to use the CS50 library because the input function here suffices. And instead of using C, I'm deliberately using S because it turns out in Python, there is no way to get a single character per se, but you can get a string that has a single character. Indeed, char is not a data type in Python. But once we have this input from the user, let's now go ahead and implement a conditional using one or more boolean expressions. Well, let's ask if S equals equals quote unquote capital Y or S equals equals lowercase Y, then let's go ahead and print out as before quote unquote agreed. And now notice what's different this time. I'm literally using the word or instead of the two vertical bars because in the spirit of Python, things tend to be a little more English-like, a little more readable, top to bottom, left to right. And indeed, or hits that nail on the head. Otherwise, if it is not an capital Y or a lowercase Y, let's go ahead and print out quote unquote not agreed. And that's it for converting this program from C here into Python. But of course, this isn't the most robust version of the program because it would be nice if the user could type in something like yes uh ye capitalized maybe in different ways. So, how might we go about implementing that? Well, we could do this in a few ways. I could of course and let's go ahead and get rid of my C version now and focus just on the Python. I could do something like this and just start oring together more possibilities like or S equals uh quote unquote yes or S equals equals quote unquote yes very emphatically or and so forth. But you could imagine that this doesn't scale very well. If I want to consider all the possible permutations maybe of the caps lock key being up or down, that's quite a few possibilities to enumerate. So perhaps we could do this a little bit differently. And in fact, we can by maybe storing all of the possibilities in a so-called list. So whereas C had of course arrays, Python has what are called lists which effectively underneath the hood are indeed linked lists as we explored in week five. Now a linked list of course can dynamically grow and even shrink. And that's indeed what Python does for us. I can simply create a list of values from the get-go. Or as we'll eventually see, I can add things to it, remove things from it, and all of the underlying memory gets managed for me. And in fact, with lists, we get a whole bunch of features that can make this possible. But for now, let's use them simply as statically initialized lists with values I know from the get-go that I want. And I'm going to go ahead and do this in VS Code. I'm going to delete most of this boolean expression, the combination of all of those there phrases. And I'm going to simply say if S is in using a Python keyword in, literally the following list of values quote unquote Y, quote unquote yes. And for now, I'm going to use just those two. But let's see how it works. Let me open up my terminal window again. Let me run python of agree.py. Really for the first time, but let me claim that it would have worked even in the previous version. Enter. I'm going to go ahead and type in lowercase y. And I've agreed. I'm going to go ahead and run it again and type in lowercase n. And I've not agreed. I'm going to go ahead and run it again. And I'm going to type in all caps. Yes, because I really agree. And yet I don't because there is a bug still in this version. So even though up here in my Python implementation I do have a list of values that I'm looking for, Python's going to look literally for those values. So lowercase Y and lowercase yes. So how can I go about tolerating different capitalizations by the user? Well, I can do this in a few different ways. I could for instance after getting the user's input in a variable called S, I could update S to be S.L, lower which is going to have the effect of lowercasing the word for me and then updating the value itself of s and now I think this will work even for an uppercase version let me go ahead and run python of agree.py pi emphatically type in yes enter and yet this time I've agreed because I forced the user's input to lowercase and then I have compared against the canonical forms I've written which are all lowercase I could have done the opposite I could have forced the user's input to uppercase and then enumerated in my Python list in between those square brackets uh capital y and capital yees but either approach here is fine now technically I don't need this additional line here I can go ahead and delete that line wherein I lowercased it and in Python I can actually ain some of these function calls together by saying input.lower so that the return value of input ultimately gets forced to lowercase by using lower here. Uh alternatively still I could just lowercase the very at the very moment I'm actually comparing it and down here I could do s. And then compare the lowercase version of what's going on uh to y or yes. Now what's really this all about? Well, this is actually an example of what's generally known as object-oriented programming or OOP for short, whereby in Python and a lot of other languages. Now, you can have variables and data types more generally that have not only values associated with them like Y or yes, but also functionality built in. In other words, whereas in C, we would have used a function from like the C type library called to upper or to lower and we would have passed as an argument to those functions the very character that we wanted to force to uppercase or to lowercase. Well, in Python and indeed object-oriented programming languages in general, the developers behind the language recognize that sometimes there's functionality that's inherently related to the values in question. And indeed, when we're dealing with strings, it's pretty reasonable to want to sometimes uppercase them or lowercase them, capitalize them, or do any number of other things. And so, built into the string type in Python is in fact the lower function itself, as well as a whole bunch of others. In fact, at this URL here, can you see the documentation for all of the string functions built into Python? More technically, when a function is built into a data type and you access it via this dot notation, instead of by calling some global function and passing an argument into it, you are using what are called methods. So methods are simply functions that are inside of objects. And in this case, the object in question itself is a string. So what's really happening with this here example when I'm checking whether the user has agreed or not is I'm taking that value that string s which is technically now an object in memory and inside of that object are is not only the user's input but some built-in functionality otherwise known now as methods and those methods were written by the same people who invented the string data type itself. So this is just the first of these examples, but we'll see yet others. But notice the syntax is actually quite similar to C, just as in C. When you wanted to go inside of a structure, you can similarly go inside of an object in Python [clears throat] and access not just the values ultimately, but also these built-in methods. All right, how about another comparison of C to Python again involving strings? Well, let me go ahead and reopen and clear my terminal and close out of agree.py. Let me go ahead and open up a version of copying strings from a couple of weeks back whereby we finally started solving it correctly by doing some proper memory management. So here in the staff version of copy 5.C we have not only a commented version of what we did a couple weeks back but we also have a reminder of how what was involved in copying strings in C. Recall for instance that we prompted the user in this example using CS50's get string function for a string that they wanted to make a copy of and then we did some error checking ultimately to make sure that there was enough memory and nothing went wrong. Then recall that the right solution to this problem in C was not to just use the assignment operator and assume that S can be copied into T, but rather to allocate using maloc enough memory for the copy plus one more bite for the null character. Again, making sure that all is well by checking the return value of that. and then actually copying character by character by character the characters from S into the chunk of memory now known as T or ultimately recall we used a built-in stir copy function which does all of that looping for us and then when it came time to capitalize just the copy we did a quick sanity check is the length of t greater than zero otherwise there's nothing to capitalize and if so go ahead and use the cype libraries to upper function passing as input that specific character t bracket zero and and updating t bracket zero itself. So here's an example of procedural programming in contrast with object-oriented programming. Again, I'm passing the argument to be uh uppercased into the two upper function as opposed to simply going to that character and asking it via some dot operator to for instance uppercase itself. Now I went ahead in the C version and printed out the two strings. I freed up my copy of memory that I myself had allocated and that was it for this program. So, it was a decent amount of work, recall, in C, to actually go about just copying a string. Well, as with so many things in Python, it's going to be so much easier. Let me go ahead and do this. Let me open my terminal window. Let me create a file called copy.py. Let me move it over to the right hand side so we can see them side by side. Closing my terminal window. And let's do roughly the same. Let's create a variable called s. Set it equal to on the right hand side the return value of Python's own input function because we don't really need CS50's own get string function. and ask the user for s. Then let's go ahead and create a second variable called t. Set it equal to literally s. capitalize whose purpose in life, if we read Python's documentation for string methods, will be to uppercase the first letter of the word that the user has presumably just typed in. Then I'm going to go ahead and print out as before the user's input. And I can do this in a couple of different ways, but I'm going to use one of our format strings and say s colon and then interpolate that variable s by using my curly braces to say put the value of s here. Then I'm going to go ahead and print out t by saying t colon interpolate its value here inside of quotes close parenthesis. So let's see if this works. Let me go ahead now and run python of copy.py. I'm going to go ahead and type in say cat in all lowercase and hit enter. And now notice S remains in all lowercase, but the copy indeed has been capitalized alone. All right. Well, let's take a look at one other example involving strings uh between C and Python equivalents. Uh let me go ahead and remind us that a few weeks back too, we created this uppercase program whose purpose in life was to prompt the user using get string for a string saying here's the before string. then it prints out after because the purpose in life of this program was to uppercase all of the characters in the string, not just capitalize the first one. So, as you might expect, we used a loop a few weeks back and we iterated from zero on up to the length of the string using plus+ to increment i in each iteration and then each time we went ahead and printed out one character at a time. So, strictly speaking, we didn't change the string from lowercase perhaps to uppercase. We just changed each letter to uppercase and printed it out right away. Well, how might we do something similar in Python? Well, here too we have a couple of different approaches. Let me go ahead and open up my terminal now. Run uh code of say uppercase.py. Close my terminal window and let's drag this to the right so we can see them side by side. And let's do roughly the same. Let me create a variable this time called before. uh set that equal to the return value of input and just prompt the user for that before string. Then after that, let's go ahead and print out preemptively after colon space space just to align everything nicely. But let me not print a new line yet because I want to go ahead and see uh the following string on that same line. And then let's go ahead and do this analogously to the C version first, but then tighten things up. Here's how we can iterate in Python over every character in a string. I don't need to bother with I and indexing into the string or anything like that. I can using a Python for loop simply say for each character C in that string called before go ahead and print out the uppercase version of that character. But don't yet print out a new line. But at the very end of this loop, go ahead and print out nothing but a new line. Let me go ahead and open my terminal. Run Python of uppercase.py. Enter. Type in cat in all lowercase. Cross my fingers. and after each and every one of the characters is uppercased. And what's nice about this, if nothing else, is that this for loop in Python there on line three is pretty elegant, whereby you implicitly get access to each character in the string because that's how Python knows how to iterate over a string object. But it turns out we don't have to do this quite as analogously in Python as we did in C. We don't have to do it character by character in so far as Python is object-oriented and these strings are objects and those objects have methods. those methods will actually operate on the entire string at once unlike the more pedantic work we had to do character by character in C. So in fact let me go ahead and close the C version here uh clear my terminal and hide it and let's go ahead and make this quite simpler. Let's get rid of the for loop al together and let's simply and let's get rid of that print statement al together leaving only the before variable and getting the user's input. And now let's create an after variable. Set it equal to before dot upper thereby uppercasing the entire string called before and setting the return value to the after variable. And then let's go ahead and print using our old friend string uh after colon uh space and then interpolate the value of that after version. So now we're down to just three lines at that. Let me go ahead and reopen my terminal. Python of uppercase.py enter. Type in cat and all lowercase. And voila. Now I have capitalized the cat all at once. All right. Before we take a break for some uh fruit by the foot, let's go ahead and take a look at Python's implementation of loops further. So in Scratch, recall that we implemented a loop with something like this. If I wanted to meow three times on the screen, I would literally use a repeat block. In C, it was a little clunkier to mimic that same idea. Like we could implement a variable uh called I and set it equal to zero. Then we could ask a boolean expression, is I less than three? If so, print meow and then increment i using our old plus+ friend, which in Python is now gone. In Python, we can do this almost the same except I don't think we need the data type. I don't think we need the semicolon. We don't need the parenthesis. While still exists, we don't need the curly braces. And we can't use the plus+. We don't need the f. I mean, we're mostly just trimming clutter from this here implementation. So, this is the C version. This now is the Python version. a little tighter, a little easier to read. It's pretty much the minimal syntax available to get the job done. So, how can we actually have a cat meow in this case? Well, let me go into VS Code and I'll stop doing everything side by side and just stipulate that we've done most of these examples previously in C. And in my first cat, well, I could certainly do it the easy way. And let me go ahead and create cat.py. And like we always started in the past with, I could just do me and then our old friend copy paste. And this of course was bad for bunches of reasons, but it gets the job done. In Python, if I want to do this, well, I can just borrow that same inspiration and I could say set I equal to zero, then do while uh I is less than three colon, then go ahead and print out meow and then go ahead and do I equal or rather I plus= 1 is maybe the most succinct way to express that same idea. All right, just to confirm that this works, Python of cat.py. Enter. Meow meow meow. All right. So, how else can we do this? And how can we do this more Pythonically? This is perfectly correct. Many people might implement it this way, but it's not quite as succinct as we could alternatively do in Python. Yeah. >> Yeah. So, we could maybe use a for loop. And in fact, let's let's go there because we don't quite have the same types of for loops in Python as we did in C. while loops are essentially the same, but for loops are actually a little bit different and actually a little bit better. So, let me go into my code here, delete all four of these lines, and literally just say for i in this list of values 01 and two colon print meow. In other words, in four loops in Python, you don't have the parentheses, you don't have the two semicolons, you don't have the initialization and the boolean expression and the update. You just say a little more English-like for each I in the following list or for each value of I in the following list. And what Python will do for us is automatically on the first iteration set I equal to zero. On the second iteration set I to one on the third iteration set I to two and then there's only three things in the list. So that's it. And so just as before with the Y and the yes example where I use square brackets similar to arrays and C, I was using a Python list of strings in that case. Here I'm using a Python list of integers 0, one, and two. And they're integers in the sense that they have no quotes around them. So they're obviously not strings. And I'm printing out meow this many times. And indeed, if I do Python of cat.py again, I get meow meow meow. This is correct. This is arguably better, at least in the sense that it's two lines of code instead of four. And it's arguably more readable as well. But what do you not like about this perhaps even if you're only seeing it for the first time? >> Yeah, it's going to be a lot more difficult to do things more than three times because recall in Python in in Scratch at least. And in C, we had the ability to either express ourselves literally or at least in C, we could just change that three to any number we want. 30, 300, no big deal. It's a super simple change, even though it was kind of annoying to type all of this out. Well, in Python, yeah, I could do this and say for I and 0 1 and two just to mimic the numbers that we'd be setting I equal to in the C version. Frankly, this can be any list. It could be 1 2 3 4 5 6 uh cat, dog, bird, or any three things whatsoever. But I'm just using 0 1 and two for consistency with the way C would have done it. But slightly better than this is to use one of those other data types that was briefly on the screen earlier. We have not just floats and ints and stirs and lists and tpples. We also have what are called ranges. And range is not only a data type in Python, but more literally a function that you can call to get a range of values from zero on up. So I can change this list of three values to a function call to a function called range. Pass in how many things I want and by default, per the documentation, I'll get back a list of numbers 0, 1, and two. And nicely, Python's pretty smart about this. It technically doesn't hand you back all of the numbers at once, whether it's three or 30 or 300 or 3 million. It sort of hands them back to you one at a time. So you're not using more memory just because you're doing more iterations. So now if I do want to iterate four times, five times, 30 times, 300 times. I again can just change the single value. And if you want to be fancy too, you can skip numbers. You can go count all the way through odd numbers or even numbers. You can change the incrementation factor. But the default and the most canonical is indeed just to count up like that. So if I go back to VS Code here and improve this, I can change that hard-coded list to just range of three, clear my terminal, run this cat one more time, and now I'm back in business as well. In fact, this is so common. Let me throw up one alternative to this. You'll notice that in the previous example, both in VS Code and on the screen, um I am not actually using I in any way. In fact, if you look back at how we converted the Scratch to Python code, I'm using I because when you use a for loop in Python, you have to give it a variable in some list or range of values. That's just the way it is. But I'm technically not using or printing I anywhere. And that's fine. And so it's arguably Pythonic, too. If you have a variable out of necessity, but you're not actually going to use it for anything useful, just call it an underscore instead. And even though this is weird looking, an underscore is a valid symbol for a variable name in Python. So it is Pythonic to just use this just to signal to yourself later and to colleagues that yeah, I'm using a variable because I have to, but it's not one I'm actually going to use elsewhere. It's a minor subtlety and not strictly uh necessary, but perhaps commonly done. All right, how about a couple final versions of cats then? So recall that if we wanted to do something in Scratch forever, we had a forever block which literally did that. Well, in C, we couldn't quite translate that literally. So the closest uh approximation was probably this while true, whereby you have a boolean expression that by definition is always true. So the loop is never going to stop, thereby infinite. If you wanted to print out meow meow meow on the screen, adnauseium. In Python, you can do it almost the same, but the curly braces are about to go, the f is about to go, the back slashn, the semicolon, and the parenthesis. But for whatever reason, in C, we lowercase true and false. In Python, we capitalize true and false. So, a minor subtlety, but it's now indeed capital T, but the indentation has to be the same and the colon has to be there as well. So, with that, we can of course induce intentionally or otherwise some infinite loops. As with C, you can break out of them if need be with control C to interrupt the process. But let's just see lastly with this cat how we can make it a little more abstract like the final versions of our cat in Scratch and C. So let me propose to open up here uh in a pro version of cat that we looked at that we wrote in the past. Uh it was version 12 at the time which looked a little something like this. This was one of the final versions of our cat in C that simply allowed me in Maine to call a meow function that took an argument which is the number of times I wanted to meow. This in C is how we implemented that helper function so to speak that returned nothing. So its return type was void but it did take an integer called n as its input. And then there was a for loop inside of there that printed meow that many times. So long story short, this was how both in Scratch and in C we invented our own functions. Well, how can we do this now in Python? Well, let me bring this version of cat over to the right here. Delete that previous version. And let me propose that we do this. For I in range of three, let's go ahead and assume for the moment that there is a meow function in Scratch whose purpose in life is to just meow on the screen. Well, that of course does not exist. So, in Python, I'm going to use a trick that allows me to define my own function. And the keyword for this is literally defaf for define. the name of the function and then parenthesis if it takes no arguments. You don't need the void keyword even if it takes no inputs. So let's do a simpler version of the cat first that takes no arguments and then we'll add back that argument. How do how does a cat meow? It literally just says meow on the screen. So already we seem to be an improvement. I've got like four lines of actual code here versus like 20 or so on the lefth hand side. Let's go ahead and run Python of cat.py. Enter. And we see the first of our errors which is remarkable because usually I would have messed up by now. So here we have in Python the equivalent of like a compiler error message. The program has not run. It's tried to run. It's tried to be interpreted but it encountered some error. These are generally called trace backs in the sense that you see a trace back in time of everything the program was trying to do just before it failed. So if you've called a function which called a function which called a function, you'd see all of those function calls on the screen. I've just tried to call one function. So, it's a relatively short error. This is clearly a problem. And here's the type of problem. Name error. The name Meow is not defined. So, intuitively, even if you're seeing Python for the first time, why is ma meow not defined even though it's literally defined right there? Yeah. >> Yeah. As smart as Python is visav, still kind of naive in that meow doesn't exist until line four. So, if you try to use it on line two, too soon. All right. So, in C, we fix this problem by initially just kind of hacking things together by just all right, well, let's just define it up here and then move that down there. And that's totally reasonable. And in fact, if I clear my terminal and rerun Python of cat.py, we're back in business. But I'd argue you can only do that so many times, especially once you've got a bunch of functions. You don't want to relegate like the main part of your program, which really this loop is, to the very bottom of the screen, if only because like that's the first thing you care about. I want to see at the top of the screen. And that's the whole point of putting main at the very top. So what was the solution in C? The solution in C was to put the prototype for the function at the top of the file. That though is not a thing in Python. You don't just copy that first line of code, put it at the top of the file, add a semicolon, and then it works. Instead, the Pythonic way to solve this problem for better or for worse is to actually put your code in a main function. Main in Python has no special significance in this sense. It's just convention to borrow the name that so many other languages use as the main function in those languages. But you just wrap your function in a function main so that you're defining main then you're defining meow before you're actually using the meow function per se. But I have made a mistake. If I run Python of cat.py pi. Now cross my fingers for good measure. And now the program does nothing. Why is that? Yeah. Why is that? >> Oh, sorry. Go ahead. >> Yeah, curiously, I never called the main function. So whereas in C and in Java and C++ and a bunch of other languages, main is special. Like main is the function by definition that is automatically called. Python has no such special magic. It's not going to call main for you just because you created it. In fact, I didn't even call that main function main. It's just a convention. But the solution is exactly that. Well, if the problem is that main wasn't called at the bottom of this file, what I can do is just literally call main, which we would never have done in C, but this is conventional to do in Python. So that after you've defined main up here and then define meow down here now you can call main which in turn will call meow but at that point in the story both of those functions functions exist. So if I go down here and run cat.py again now I see my meow meow meow. Now let me add one final flourish because this version of the code in C recall actually let me specify how many times I want to meow whereas here I actually have my for loop in main at the right and I'm calling meow that many times. Well, what if I want to get rid of this loop over here and de-indent main meow here and pass in literally the number three here. Well, in Python, you can just say inside of the definition of a function that it takes an argument like n. You don't have to specify the data type. Python's smart enough to figure it out. Then in your function, you can use that as with for i in range of n. Go ahead and print meow. So now the right-hand version of this program is pretty much equivalent to the lefth hand version of this program as always using fewer lines of code. Let me go ahead and run python of cat.py. Meow. Meow. Meow. We're good. And then let me make one final change if only because most every documentation you see online or website tutorials on Python will actually have you not just literally call main at the bottom but you'll do this crazy syntax that is solves a problem that we won't trip over in this class but typically it's Pythonic to actually call main after asking the question if name equals equals quote unquote_ain main. This is a stupid mouthful of code that even I had to think about when I was typing it out if I got all the underscores correct. But long story short, this convention of using a conditional before you call main allows you to write more modular code in Python so that some of your files don't actually do anything other than define define define define functions that you can then import into other files you write. So in short, this is the right way to do it. Even though in CS50 it is unlikely that we are to trip over this bug. Questions now on that last piece of how we define functions in Python. Yeah. >> Ah good question and good eye. Why do I have two lines between my functions in Python? As you will see via style 50, it is Pythonic that is Python convention to separate functions in your code by two lines. Whereas there is no such convention in C. So I'm trying to be consistent with what the world does. Yeah. >> If you want to count backwards in a loop, can you do that? Absolutely. You could use the range function in a different way. Start count uh start with a much larger value and count down. How? But you could alternatively do that with a while loop. I would say that yeah, you can make that work, but you shouldn't. It just people don't do that unless it does actually solve a problem for you. Other questions on this? All right. Well, when we looked at C, recall there was a bunch of things that ultimately like we couldn't do well. We ran into issues of like full loading point precision and integer overflow and truncation and like all of these worlds problems. Um, there's still going to be some of those, but first let's take a fruit by the foot break and we'll be back in 10. Help yourself to seconds today. All right, so we're back and let's use our remaining time together to focus not only on some of the problems that Python can solve more readily than C, but also some of the problems that remain. So here was a program early on in our discussion of C that had this weird bug whereby when we implemented a relatively simple calculator to divide two numbers x / y. We experienced what we called truncation at the time whereby 1 / 3 was curiously zero and like something like 4 / 3 was curiously one and we were losing everything after the decimal point. And this was true even if we tried using floats because with truncation recall everything after the decimal point with integer math is simply discarded. So if you do int divided by int you're going to lose what is after the decimal point. So let's take a look in Python at whether this is still actually a problem. So let me go back into VS Code here. We'll close out the C version thereof and let's go ahead and create our own program called calculator.py. And in this version, let's modify the original, which just did some addition, and instead have it do some division instead. I'll get rid of my outdated comments and perform now division instead of uh addition by doing x / y. Python of calculator.py, let's try one and let's try three. And oh, our fractions are actually back. So it turns out in Python, even when you're manipulating integers, if you divide one by the other, and the result logically should actually be a floatingoint value, that's what in fact you're going to get back. And you don't have to jump through the same hoops that we did before to actually force things to floats and then do floatingoint arithmetic and so forth. In fact, if you want the old behavior, it's still actually there. And you can use two slashes in Python to use the old integer division as opposed to what we're seeing here. But a typical programmer I dare say nowadays would want it to behave in exactly the same way. So truncation seems to be less therefore of an issue for us. All right. Well, what other problems did we encounter at the time? Well, recall we had issues of floating point imprecision whereby even when we divided something simple like one divided by three and in grade school we learned that was like 0.333 repeating infinitely many times, we started seeing weird numbers that were not three at the end of that value back in the day. in C. Unfortunately, that's a problem that's still with us. In fact, if I use this same program here, let me go into VS Code and instead of printing out just X / Y, let's go ahead and do this temporarily. Let me give myself a variable called Z and set it equal to X / Y only because it'll be a little easier to see the formatting trick I'm going to use. Let's go ahead and print out a format string that prints out Z. And for the moment, let me just claim that this is do going to do the exact same thing. It's just completely gratuitous that I'm using an F string now as opposed to just printing out Z. But if I do 1 / 3, we're still seeing 0.333. But we're only seeing just over 10 or so digits here. What if we want to see like 50 digits and really start poking around at what's being represented? Well, the syntax is a little weird, but in Python, using an F string, you can do tricks similar to what we did with the percent f with print f and c. And if after my variable's name in this uh set of curly braces, I do a colon and then a dot because I want to see numbers after the decimal point and say something arbitrary like show me 50 digits after the decimal point and treat this as a float. This is a crazy incantation I do think of a format string even I am sort of cheating off of the paper in front of me but this is how you format strings if you want to see them with a little uh more precision or so I think. If I rerun Python of calculator.py pi and do one divided by 3. Darn it, we're still in the same mess that we were before. Now, why is this? Well, it's still the case that I'm running the code on the same kinds of computers that I did before. It's still the case that these computers only have a finite amount of memory. And so, even though I'm manipulating clearly floatingoint values, Python is only allocating, say, 64 bits to those float variables. And so, there's only so much precision that's possible. And so what we're seeing is essentially the closest representation to an infinite number of threes that we can represent using binary using a floatingoint representation therein. So still a problem but I do think in Python you'll find that there's so many more libraries out there thirdparty software that comes not just with the language itself but from others whereby you can use uh libraries for more precise scientific computing that essentially implement their own versions of floatingoint values so that you can use not 64 but 128 or more bits than that when it really matters to some level of precision. Thankfully though one problem is at least solved for us namely integer overflow. So recall that this was another problem we ran into whereby if you try counting higher than say 4 billion or even higher than 2 billion if you're representing negative numbers which has the total range that you have available to you in the positive range we ran into the situation where it somehow wrapped around became negative and then even ended up being zero as a result. Well, Python wonderfully nowadays just gives you more and more bits as needed if your integers are getting larger and larger. So this is a wonderful feature and that we've at least addressed one fundamental limitation we ran into in C and this time the language itself provides us a solution. Python 2 has some pretty handy features as well. One of them is what are called exceptions. And so an exception in Python is a way of handling error conditions without relying on return values alone. So recall that in C if you ever wanted to signify that something went wrong you have to return like most recently like null n ul which was a special sentinel value technically it's just the zero address and by checking for that you can make sure that you know if you're getting back a valid pointer or not and in other functions if something went wrong you might similarly have to check the return value maybe checking for zero or negative one or one or something like that but return values were the only way in C that functions could communicate back to the programmer that something went wrong. And this is problematic because if you imagine implementing a function that's supposed to return maybe an integer, whether positive, negative, or zero, it's kind of unfortunate sometimes if you have to steal one of those values and say, uh-uh, you can't use this value. It's fine in the world of pointers because the world decided years ago, we're never going to use the actual address o x0, the zero address. But that's still technically costing us one or more bytes of space. But in general, it's a bit annoying if your function can't truly return all possible values. Think about a function like get string. If something went wrong in getstring, what do you want to return? Well, we saw in the C uh CS50 library, we do in fact return null once we introduce that. But in general, wouldn't it be nice if functions could somehow signal out of band, so to speak, that something went wrong? So, by that I mean this, let's go into a new program that's inspired by one of our programs today. And in VS Code, I'm going to go ahead and close my calculator, open my terminal window, and create a new program called integer.py. So in integer.py, let's just play around with some integers and see what we can break. So here, I'll define a variable called n, and set it equal to the input function, which comes with Python, just asking the human for some input. Then I'm going to go ahead and ask a question. Is the user's input numeric? And it turns out if you read the documentation for strings in Python, they come with not just an upper function, a lower function aka methods, but also is numeric function or method that tells you whether or not the string itself happens to be numeric. That is looks like a number. All right. So I think if I do that, I could then do something like this. If n is numeric, I'm going to go ahead and claim that in fact it is an integer. Else if it's not numeric, I'm going to claim that it's not an integer. I have no idea what it is. Maybe it's cat. Maybe it's dog. Maybe it's a mix of numbers and letters, but it's definitely not an integer as defined by a sequence of decimal digits in this case. All right, so let's try this out. Python of integer.py. Enter. We'll type in one. That's an integer. We'll type in two. That's an integer. We'll type in zero. That's an integer. Type in cat. Not an integer. So that seems to in fact work. But what if I wanted to immediately convert this to an int as we did in the past. And so let me modify this a little bit here and say instead this n equals not just input asking the user for an integer or rather let's just ask them more generally for input but let's assume that we want to convert this input to an int. And actually we can go ahead and say integer here. All right. Well, here I'm going to go ahead and just print out the claim that yep, this is an integer because if we get to line two, well, clearly we've handled uh the user's input correctly. In other words, how can I get rid of constantly checking the return val sorry, how can I get away from constantly checking the return values of functions to make sure it is what I expect. All right. Well, let's go ahead and run Python of integer.py now. Enter. Type in one tells me it's an integer. Type in two tells me it's an integer. zero tells me it's an integer. Type in cat. Notice this time what goes wrong. Whereas last time we saw this kind of trace back error message, it was a name error because I was using the meow function name too early. Now I'm getting a value error which is a different type of error that relates to invalid literal for int with base 10 cat. Now that's a mouthful. So unfortunately Python's error messages aren't all that much better than clang's error messages. But clearly the interpreter does not like the fact that I'm passing something to int related to base 10, but that's quote unquote cat. And really, the best you can do with this kind of error is realize like, okay, it's clearly the case that cat is not an integer. So, it's having trouble converting cat to an integer. It makes no logical sense. All right. So, what's the gist of the problem? Well, I'm just blindly converting the user's input to an integer, even if it's not input. uh even if it's not an integer. Well, all right. Well, I could rewind to the previous version of my function, use the is numeric function, and then conditionally convert it, but I'm trying to move away from constantly checking return values of error messages. And wouldn't it be nice if I could somehow catch this value error and just deal with it if it happens? And in fact, you can with Python exceptions and which exist in other languages as well, Java among them. You have the ability to sort of listen for errors happening inside of functions without having to rely on return values alone. So, let me go back to VS Code here, clear my terminal just to simplify things a bit, and let me literally say to the interpreter, please try to execute the following two lines of code, except if something goes wrong, like a value error, in which case go ahead and print out something like not integer. So, wouldn't it be nice if you could just wrap all of the code you've written in CS50 thus far with try and sort of ask the computer politely like please try to execute this code? But that really is the the semantics behind it. Try to execute these lines of code except if there's an error then do this other thing instead. And therefore, you don't have to check any return values. you can just blindly pass the output of the input function as the input to the int function knowing that if something goes wrong inside of there, Python is going to execute this code instead except when something goes wrong. So let me go ahead and run Python of integer.py now. I'll type in one and that works because it's trying to execute line two and succeeding. It's trying to execute line three and succeeding. So lines four and four never actually kick in. But if I try again here with cat, line two is going to fail. Line three is never going to get reached because Python is immediately going to jump to this exception handler, so to speak, thereby catching the error or the exception and printing not integer instead. So it's a little bit of a weird convention. It's different from what C offers, but a lot of newer languages nowadays do offer this because it's a better way of just writing code that you know should work 99% of the time. But if something does go wrong out of memory, the human types something wrong in or something like that, you can handle all of those exceptional cases, exceptional in a bad sense using this accept keyword instead. questions on any of this here technique. Yeah, >> a really good question. In this case, I used a value error. Do I need to define every possible thing that can go wrong? Short answer, yes. Now, there aren't terribly many. There's some standard ones and they're all capitalized in this way. Capital letter, capital letter, something error. Typically, you can even invent your own. Um, and it's good practice to enumerate the kinds of things that you think can go wrong. Value error is pretty generic, but there could be memory related errors. There could be file not found related errors. There's a bunch of different exceptions that are all documented in Python that you can listen for. That said, as nice as Python's documentation is overall, it is not good at documenting for specific functions what exceptions they can throw. And I've never understood this after all of these years that no human has gone into the documentation and painstakingly enumerated all of the possible things that can go wrong. What's too often the case in the real world with some of my own code included is if you encounter an exception that you didn't think was going to happen, you go in and improve your code and add to this list of except clauses. What else might go wrong? Shouldn't be that way. And different libraries are better about documenting these things. All right. Well, with that in mind, let me propose that in the CS50 library for Python, get int and get float, they work just like the C library whereby if you type in cat or dog or bird into those functions, they just reprompt you. They just reprompt you. And long story short, this is the kind of code we wrote in Python. Try to get input from the user except if something goes wrong, prompt them again, prompt them again. So, we too were using precisely these features even though it wasn't something that was available to us in C. All right. But something else that we did in C was play around with Mario in a few different forms. And in lecture recall a few weeks back, we experimented with like using some asy arts, some very simple text to print out something like this pyramid of height 3. Well, how can we go about printing something like this? Well, I would propose that if I go back to VS Code here, let's close out my integer examples, code up a new version of Mario in Mario.py. This one's kind of simple. I can say something like for I in range of three, go ahead and print out quote unquote a hash. down in my terminal window, Python of Mario 3, and I've got really the closest analog to three bricks stacked on top of each other in this way. But in C in eventually, uh, our implementation of Mario started to get a little fancy and we started to prompt the user for the height of the p of the wall and therefore we could have not just three but maybe four or even more bricks being printed. So, let me actually open up that version from a few weeks back whereby from week one we had a version of Mario that looked like this whereby we after including some header files declared in main a variable called n. Then we saw a new construct at the time, a dowhile loop that just keeps using get int get int get in so long as n is not uh one or greater equivalently so long as n is less than one and kept prompting the user again and again. The reason for having n up here recall was issues of scope. This therefore it's accessible lower in the function as opposed to it being confined to those curly braces. And then down here we used a for loop to actually print out that many hashes. So in short, the dowhile loop solve the problem in C, whereby you want to get user input at least once and maybe again and again and again if they don't cooperate the first time. And that's where doh loops really shine. Do something at least once and maybe again again and again. Otherwise, it's a little more annoying to do it with while loops or for loops. Unfortunately, Python does not offer a dowhile loop. And so here too, we have an opportunity to introduce you to what the world would call Pythonic. What is Python's solution there too? Well, on the right hand side here in Mario.py, let's change this a little bit and let's do from uh let's go ahead and do uh while whoops while true capital T. Go ahead and use a variable n. Set it equal to int input height asking the human for the height of the wall. And I'm going to just cross my fingers that they're not going to type in cat or dog or something that's not an int. In this case, I'm going to say if n is greater than zero, that is a positive number. That's useful. We can proceed. I'm going to now break out of this loop. And then lower in the file, I'm going to say for i in range of n, go ahead and print out the hashes. So we still have that same lesson as before, like the Python version seems to be shorter, more concise, even if you ignore the comments on the lefth hand side. And I've completely avoided using a dowhile loop. But there are a few things that are different nonetheless that feel like versus C shouldn't even work. Like what's weird about this solution even though I think it's actually correct? Yeah, >> I have two. >> Okay, so it's not correct. That's uh one of the first things to point out. So, too many prepositions for this was supposed to say for I in range. Okay. So, now that this program's correct, what looks weird to you and probably could break it. Yeah. >> Yeah. So, the end variable should be it seems to be scoped to the while loop, at least in so far as it's indented inside the while loop, which feels analogous to being inside of curly braces and C. And so it seems weird that I'm presuming to use n on line six even though it was only defined on line two. It turns out this is possible in Python. The issue of scope that we encountered in C is not as rigorously enforced. We'll say for today such that when you define N up here, you can actually use it down here. And you can think of this as being a little reasonable because if there's no more specification of what data type n is and no more semicolon. Just imagine it would look kind of stupid if you just put an a blank N there and hit enter just so it kind of exists. There's no way to express the idea of create this variable in advance without actually assigning it a value. Whereas in C we could do that. So this is in fact okay and correct. Um what else is going on here? Well instead of a do while we're kind of just implementing the idea of it. I'm just blindly inducing deliberately an infinite loop like do the following forever but then as soon as I have the answer I want like a positive integer from the human break out of this loop and this is indeed the pythonic way to say get user input because this will minimally ask the user for a height once and maybe more and more times. So no do loops only while loops and for loops and only while loops are really the same as in C. Even for loops we've seen are a bit different. All right. Well, how about instead of just that Mario uh example, recall this one where we wanted to print like four question marks in the sky side by side. Well, we can do this in a few different ways. Let me go back to VS Code, close the C version, and let's just completely change Mario.py to implement this. Now, I want four question marks in the sky. So, I think I can do something like for I in range of four, go ahead and just print out quote unquote question mark. Do you like this? Python of Mario.py Pi. Should I run it? No. Why? This is how I did it in C. Yeah. >> Yeah. I got to edit the end value, the named parameter for the print function because otherwise if I hit enter, they're all on different lines, which is not the effect I want when all four question marks are meant to be side by side. All right. Well, that's an easy fix. I can pass the named parameter called end into the print function. Set it equal to quote unquote with double quotes or with single quotes. As always, stylistically, I would be consistent. So, I'm going to use double quotes even though the documentation is consistent with its single quotes. Now, I'm going to rerun Mario of Python Mario.py. And I'm so close. Now, they're on the same line, but the stupid cursor didn't move to the next line. That's fine. How to fix this? Well, just logically, I can put a blank print statement below. And even though I'm not passing anything in, you get a new line for free when calling print. So even though I'm not passing in any arguments, I am getting the aesthetic effect that I want. So that is a perfectly reasonable way to do it. Now, if you feel yourself becoming a bit of a geek though in learning about Python and previously C, you can even solve this problem even more Pythonically by saying print quote unquote question mark* 4 using multiplication similar in spirit to the plus operator for concatenation. And now multiply the exclamation point by itself four times. So now if I go down here and run Python of Mario.py, I get a very elegant solution to exactly that same problem. even more concisely than my previous version. What if I want to do something in two dimensions? Well, recall that we moved to the underground of Mario Brothers here and we had like a 3x3 grid of bricks. How can we do that? Well, in C, we had nested for loops using I and J back in the day. And I could do the same thing in Python. Let me go back into VS Code here and let me do one outer loop for I in range of three. Then let me do an inner loop for J in range of three. Then let me go ahead and print out a hash. But let me learn from my past mistakes. I don't want to print out a new line every time. So let's override that default. But after each row, let's print a new line. So that down here, I can go in Mario.py, run it, and I've got my 3x3 grid of bricks. I could change this a little bit and call this row and column. Even though here too, even more so. I'm not literally using row and column anywhere explicitly, but semantically it kind of explains maybe a little clearer to the reader what's actually going on. So that might help. But we could tighten this up too, right? If I just want to print a 3x3 grid, well, I know that the top thing here will iterate three times. And I know how to very elegantly print things out with a oneliner. So I could just print out a hash times three in this case. And then down here, I can go to Python of Mario. And voila, I'm back in business 2. So it's just sort of easier to do these kinds of things and express yourself all the more succinctly. Well, what else can we do? Well, it turns out in Python that unlike arrays, you can ask lists how long they are. So you don't have to keep around a variable of how large an array is. You can just add stuff to a list and then ask Python how long is this list? How many elements are in it? Case in point, let me go back to VS Code and clear out Mario.py pi and let's reimplement from a few weeks back the notion of uh calculating uh like and the average uh quiz score that you might have in a class. So in score.py, let's go ahead and create a program that's got a list called scores of three scores that we've seen before, 72, 73, and 33. And recall that we tried a few weeks back and see to average these together. And to do that, we had to add them all together. We had to uh divide by the total number of elements in the list. Like it wasn't that hard. It was sort of like grade school arithmetic to calculate an average. But Python has more functions available to us. Not just length, but even summation. So let me go ahead and do this. Let me say that my average variable shall be the sum of those scores divided by the length of those scores. And indeed, per the documentation, Python has a lang function, leen for short, a sum function which takes the add uh which adds together all of the elements in that list. And so down here now I can say something like print with an f string or format string that the average is whatever that value is. And I don't have to do any loops or math myself. I can just call the function like I could in Excel or Google Sheets or Apple numbers. Python of score.py enter. And my average is in fact 59.3333. And then some weird imprecision at the end there. And in fact just for consistency with our C code, let me rename this. I'm going to rename score to scores plural. That's going to close the window. But now at least you'll see online that we have a program indeed called scores. Well, this is not that interesting because I've just hard-coded my 72, my 73, and 33. What if we want the human to be able to type that in? Well, I think we can do that, too. So, let me actually open up that version of the file now pluralized. Let me go ahead and not initialize the list for the human, but let me set it equal to an empty list. Just using an open square bracket and close square bracket, like an array that has nothing in it. But this one is literally of size zero at the moment. And now let me do for I in range of let's just for now ask the user for three scores. Even though we could certainly ask the user how many scores do they want to input and then use that number instead. So in each of these iterations, let's ask the user for a score using something like int input score. I'm going to set aside the reality that if the user types in cat or dog, the whole thing's going to break and therefore I should really add my try and my accept. But I'm going to discard that error checking and focus only on the essence of this program for now. Now after line three, if I have in a score variable the user's quiz score, how do I put it into that array? Well, in in that list, well, with an array, I had to use the square bracket notation, keep track of how big it is and use like bracket I or something like that. No longer in Python because a uh list is an object that has not only data but functions aka methods associated with it. I can just call a method that comes with every Python list called append and pass in that score using that same dot notation as before. The rest of my code can stay exactly the same. If I now run Python of scores.py pi and I type in 72 73 33 manually though I still get that same average and notice I did not need to decide in advance how big that list of scores was going to be questions on what we've just done with lists. No. All right. Even cooler for some definition of cool is that we can now implement hash tables or more generically dictionaries sets of key value pairs by just using a data type that comes with Python. I claimed last week that like Python that dictionaries are sort and hashts in particular are sort of the Swiss army knives of data structures and that they just let you associate some piece of data with others. With Python, you do not need to jump through the hoops that you needed to with problem set five implementing your own spell checker and your own hasht. you just create a dict object in Python, a dictionary that gives you the ability to associate keys with values. So, case in point, let's do this. Let me go back into VS Code and close out scores.py and let's create a new and improved version of our phone book in phone book.py. Let's go ahead and come up with a list of names just to demonstrate how we could store a bunch of names in the phone book irrespective of numbers and set those equal to say uh Kelly's name and my name and John Harvard's name just by putting four quoted strings or stirs inside of this list. Now let's ask the human using the input function for the name that they want to search for in this list. And now let's implement linear search using Python. I can do this in a bunch of ways, but one way is to say for each uh name, we'll call it n in names, go ahead and ask the question if the name I'm looking for equals the current name in the list that I'm iterating over, go ahead and print out just something generic like found and then break out of this loop. And let's see if we can find Kelly or David or John or someone else. Python of phonebook.py. Enter. Searching for the name, say David. Enter. And it was in fact found. Let me go ahead and search for someone else's name that's not in there, Brian. And now it's not in fact found. Although it's not all that enlightening to just ignore the question altogether. It would be nice to say not found. And here where is where in C it would be kind of nonobvious to do this in C. If you wanted to print out found or if you get through the whole list and you still haven't found the user, print not found. you'd have to like keep track with the variable of whether or not you found the person or you'd have to return from the code prematurely just to get out of it logically. Turns out somewhat weirdly but wonderfully usefully for loops in Python can have else clauses associated with them whereby I can say down here print not found. If I run this version of the program and search for someone who's not in the phone book like Brian now I actually see not found. Semantically, it's a little weird, but essentially what's happening is if you get through this whole loop and you never call break, then you've not actually broken out of the loop. So, you're going to hit the else. And in that case, you're going to print out not found. And this is such a common thing to like do this kind of bookkeeping and keep track of whether or not something has happened inside of a for loop. And if so, do this, else do that. Else literally handles that scenario in Python. And this is the most C unlike thing that we've perhaps seen in terms of features with regard to at least loops. All right. Well, this is great that I've kind of implemented linear search, but like we did that in C and it's getting a little tedious. Can't we do better? We actually can. Let me clear my terminal and tighten this up. Instead of iterating over every name in names, just like we keep iterating over integers in ranges and checking for each name if it equals the thing we're looking at, you can actually do something much more clever. You can just literally ask Python if the name you're looking for is in the names list, then go ahead and print out uh found, else print not found. And so this is where Python 2 gets kind of cool. In line five, you have just a simple if condition with a boolean expression name in names. How does Python know if name is in names? It uses linear search presumably to search over the whole list of names looking for what you care about and then tells you true or false if it found it. You don't have to write the code to iterate over it with a while loop or for loop or whatnot. You just say what you mean. And so here too, it's a little more English-like. If name in names, question mark, then print found, much more so than it would be pronouncable in C. So that's one other cool feature that we now have at our disposal. What's yet another? Well, when it comes to dictionary objects in C, or rather in Python, a dict object really just gives you a set of key value pairs. And we've seen this kind of chart before whereby we might have name and number and name and number and name and number. How do we translate this to code? Because in C, as with problem set 5, it was going to be quite an undertaking to be able to store a whole bunch of things in memory in the form of something like a hash table. Well, in Python, we can actually define a dictionary ourselves. So, these square brackets represent a list, but I can alternatively use curly braces for a very new purpose. I'm going to go ahead and hit enter just to move the second curly brace to a new line. And I am going to now enumerate a bunch of key value pairs. Namely, quote unquote Kelly for the first key colon. Then we'll do + one 617495 1,000 as the number. Then I'm going to go ahead and do quote unquote David for the second key. And since we both work here, I'm going to go ahead and just use that same number as we've done in before. Then a third key for John Harvard colon. And for John, we'll use plus one 949 uh 4682750, which is fun to call or text this. Now, even though it's syntactically a little different, gives me the equivalent of this chart here, key value pairs, where the keys are the staff names and the values are the staff numbers. That implements all of that, a hash table, if you will, in Python's own syntax. So, how do I now use this? Turns out I can actually use it in exactly the same way. I'm going to go ahead and generalize this now to people because it contains not just names but names and numbers. So I'm going to change this variable down here to people too. But notice the syntax now. I can still ask the human for a name they want to look up. I can now still say if the name is in the people dictionary. And by definition, Python's going to interpret that preposition in as meaning is the following key in the dictionary. And if so, it's going to return true. But what's cool about this is that besides just making this work as follows. Python phonebook.py. And let's type in David. And there's my number. Oh, that's not my number. It just says found. Let's run it again and type in say Brian. Not found. Okay, that's as expected. But I'd like to know what my number is or Kelly's number or John's number. Well, that's an easy fix, too. Inside of this conditional, I can say something like this. Number equals people bracket name. And we've not seen this before, but we have seen square brackets in C when we had arrays. This square bracket notation is how you indexed into an array to get a specific value 0 1 2 3 4. What's amazing about dictionaries, not just in Python, but in other languages as well, you can now index into a dictionary just as you can index into an array. But whereas an array you use numeric indices, in dictionaries you use string indices. You can use strings to look up their corresponding value. So to be clear, name at this point is given to us by the human's input. So if I typed in DAV ID, name equals David. So this is like saying people square bracket quote unquote David. Find David's number. that stores the answer from this two column chart in the variable called number. And all that remains is for me to print it out, which I can do using an old fing. Now, let me go down into my print statement, change this to an fstring, add a colon, add the number variable to be interpolated, rerun this program as Python of phone book.py, type in my name, and there's my number as found. And this is incredibly powerful. And why again uh hashts and in turn more generally dictionaries are sort of the Swiss army knife. Being able just to look up data with such simple syntax is wonderfully useful and powerful. And in fact we can even do more than this. For instance, let me propose that if you think about other incarnations of um key value pairs, you see them all the time. For instance, in like spreadsheets, like here's a screenshot of Google Sheets whereby I've got the beginnings of a spreadsheet with uh names and numbers. But in this model, I want to actually associate some metadata with my data. So the data I care about is the actual names and numbers. But you could imagine having a third column like email address and maybe home address or any number of other pieces of data associated with these three people. For now, I've just got two columns or two attributes, names and numbers. Each of the rows in a spreadsheet, as most anyone knows who's used a spreadsheet before, represents different records or different pieces of data, like this is Kelly, this is David, this is John, and so forth. We can implement this idea using dictionaries and lists together. So the syntax is going to be a little strange at first, but let me go back to VS Code here and let me change my people uh dictionary to be a people list between square brackets. And the elements of this list now are going to be uh dictionaries themselves. I'm going to use some curly braces inside of these square brackets and say that the name of one person is quote unquote Kelly and the number for that person is quote unquote +16174951 1000 close quote then comma on the outside of the curly braces then I'm going to have another quote unquote name colon dv ID comma then another number colon I'm going to borrow the same phone number because we both work here then lastly a comma and finally quote unquote name colon quote unquote John and then lastly a quote unquote number for John colon plus one uh 949468275 zero. All right. So what's going on here now? Our people variable is now not just a simple dictionary with just individual key value pairs. Name number name number name number number. We now have a more generalized way of storing not just a name or a number but an email address or a home address or any number of other values. How? Well, the commas just separate the key value pairs now. So, if I do have email addresses for us, I can put comma quote unquote email colon like [email protected] and I can just keep adding these key value pairs to each of the dictionaries because a dictionary is a collection of key value pairs. So it stands to reason that I can associate name with David, number with the number, email with mailinhar.edu and so forth, effectively implementing this idea now in the computer's memory. And at the risk of significantly oversimplifying, this is what Google and Microsoft and Apple are doing with their spreadsheet software. They have written code that presents to you a nice table with a graphical user interface on the screen, but underneath the hood, what they effectively have is lists of dictionaries representing each of those rows. And we're going to come back to this when we start experimenting before long with our own databases. Going to get back rows of data from databases. We are going to store that data in lists of dictionaries for the same reason as well. So, how can we use this? Well, let me hide my terminal for a second and tweak the program just a little bit. I'm still going to get the name of a person to look up their number. I'm still going to uh how about iterate over this because I've lost the ability at least for now to just ask a question like is this name in the structure because it's a list I do now need to iterate a little bit differently. So I'm going to do for each person in the people list go ahead and check is the current person's name equal to the name I'm looking for and if so go ahead and create a variable called number. set it equal to that person's number and then go ahead and print out for instance found colon then in my curly braces that specific number and then after all that break out of this. So this is a mouthful but recall that it's all the same syntax we've seen before in smaller parts. Square brackets and square brackets means here comes a list. What are the elements of this list? dict dict three dictionaries back to back to back each of which has a key and a value and a key and a value called name and number respectively. The second one temporarily has name and number and email as keys plus three values and the third one has keys of name and number as well with their corresponding value. So when I iterate over each person in the people list that means on each iteration person is going to be set to this dictionary then this dictionary then this dictionary on each iteration I'm asking this question is that current person's name key uh is rather is the value of that person's name key equal to the name I'm looking for and if so grab a variable called number set it equal to the value of that person's number key and then just print it out. And if we wanted email instead, I tweak the word uh number to email. If I want to look up anything else, you can tweak that code there. But being able to index into dictionaries using strings is sort of the fundamentally powerful new technique that we have here. Question now on any of this? Yeah. >> If both >> Good question. If you wanted both name and number on the screen, do you concatenate? Sure, you could do that. Or print them out by passing a comma into the print function and printing one out each way. Absolutely. However you want to format it. And actually, just as an aside too, even though this becomes a little less readable, this is a little silly that on line 11, I'm declaring a variable called number only to use it one line later and then never again. Technically with those curly braces and format strings, I could just take this code on the right, plug it into those curly braces and get rid of this variable altogether. Just at some point though, fstrings start to get a little too hard to read with quotes inside of quotes. And so like I kind of prefer being a little more pedantic about it and explicitly putting it in a variable and then interpolating just that variable. But you could do it in different ways still. All right, couple final features of Python that'll get us on our way with doing other things. Turns out there's a whole bunch of libraries that come with the language itself that you nonetheless have to import. Even though they're not third party, you didn't have to install them. You just need to add them to your code by importing them. One of them is CIS. And among the things that the CIS library has in Python is the ability to give you access to command line arguments. After all, we've lost access to command line arguments because there's no more main, at least by convention. There's no int main void. There's no int main argv arg stuff going on in our code. But all of that functionality is still available in a library called uh cis. So how do we use this? Well, let me go back to VS Code here now. Let me create a relatively simple program called greet.py. Similar to a few weeks back that's just going to greet the user using command line arguments instead of get string or the input function. I'm going to do this by saying from the cy library import argv. In this case, argv is essentially just a list. It is a list of the command line arguments that the human has typed. It's a list, which means you can just ask the length function leen what its length is. So, there's no need for arg anymore. You can just literally ask arg how long it is, which is kind of nice. So, I'm going to say this. If the length of argv uh equals 2, which means the human typed two words at the prompt. Okay, let's go ahead and greet them assuming that's their name and say hello, and then whatever their name is. Let me make this a format string. And to be pedantic, let me create a variable called name and set it equal to argv bracket 1, which is going to be the second word that the human typed in, as has been our convention in the past. Else, if they didn't type exactly two command line arguments, let's just go ahead and print out something like hello world as generic. Let me run python of greet.py. Enter. And you see hello world because I apparently did not type in exactly two words and yet I did. So let's see where this is going. Let me rerun Python of greet.py but type in my name David at the command line. Enter. And huh I screwed up unintentionally. What did I do wrong? All right. Print f is not a thing. So that's an easy fix. Let's delete it. Let me clear my terminal window. Rerun python of greet.py space David. Enter. And now I get hello David. The only thing that's weird here is that I typed in three words at the prompt and yet I'm checking for two. And it's a bit subtle, but with Python and RV, it ignores the Python interpreter. It goes without saying that you're using the Python interpreter to run a Python program. So the only things that are being counted are the words after the Python interpreter itself. So when I type greet.py and David, that's two. When I only typed greet.py, that's one instead. All right. So now that I've done that, I have access to my command line arguments. Again, what about my exit statuses? This was getting a little low level, but in recent C programs, we've had you all returning zero on success, returning one on error. Can we still do that? Well, yes. And in fact, the CIS library is used for that as well. So if I want to actually add some exit statuses to a program to facilitate check 50 and automated tests in the real world, I can do that with a program called let's call this uh exit.py. And in exit.py, Pi I'm similarly going to import uh CIS but in a different way. I'm going to give myself access to well yes let's go ahead and import the whole library just to demonstrate how you can access things inside of it without explicitly saying from cis import such and such as before if uh the length of cis.orgv arg. So this is a little bit different, but I'm asking the same kind of question. Does not equal to. I want to go ahead and print out to the user missing command line argument, which is something we did a while back as well. And then I want to exit with code one. CIS.exit one else. If I don't run into that issue, I'm going to go ahead. Actually, let's not even bother with an else. Let's for parody with our C version, let's do this. print f quote unquote hello uh cis.orgv bracket one close quote cis.exit exit zero. All right, that's a whole mouthful, but what's really going on? So, I could have done from cis import argv, but I don't need to enumerate every single variable or every single function that I want from a library. I can also just more generally say import the whole library. Give me access to everything and then I'll tell you what I want from it later. Therefore, on line three, I can still access argv. I just have to scope it to the cy library. So that I say cis.orgv not arg means go inside of that library and find me arguing it to a variable unto itself in my own code. Why am I saying not equal to two? Well, if they don't give me two words uh after the interpreter's name, I want to yell at them and say missing command line argument and then exit one. I'm not going to give them a default hello world anymore. I want them to give me their name. Meanwhile, if I get this far and I haven't exited from the program, I can print out cis.orgv bracket one, which is going to be David in the example I typed before. And this means success. So cis.exit zero signifies success. It's more syntax than before uh than it was in C, but we have the exact same functionality available to us as we have in the past. How about one other example that we've had in the past. Let's convert it to Python as well. So you have a few more tools in your toolkit. How about implementing a version of this phone book that actually persists? So instead of hard coding into it Kelly and David and John in this way, let's actually let the user type in a name and a number just like on your iPhone or Android phone and add it to a text file like a CSV file as we did before uh using commaepparated values. Well, it turns out that Python comes with a library to handle CSV files. We don't need to hackishly implement our own CSV support by printing the commas ourselves. Instead, we can import the CSV library. We can then create say a variable called file set it equal to open and open a file called phonebook.csv in append mode. So this is almost the same as C except it's open instead of fop which we saw a couple of weeks back. Now let's ask the user via the input function for the name they want to add to their contacts and the number that they want to add to their contacts. And then in after that, let's go ahead and do this, which is a bit of uh muscle memory to to remember, but I'm going to create a variable called writer, but I could call it anything I want. Set it equal to CSV.riter, which means there's a function called writer in the CSV library that I'm simply accessing it because I didn't import it explicitly by name. And I'm going to pass it that file. This tells Python, turn that file into a CSV that can be written to. The next line of code, I'm going to literally say writer.right row. Write row is a method aka function associated with this writer object. And I know that only because I did actually read the documentation uh for the CSV library. What do I want to write? Well, I want to write a list of values, namely a name and a number. And I'm using square brackets to tell the right row function that here you go. Here's a list of values, two of them, a name and a number. After all that, I'm going to do file.close and just close the whole file. All right, so where does this actually get me? Well, let me go ahead and open up phonebook.csv, which is initially empty. I'll move this over to the right hand side. But when I now run this program with Python of phonebook.py, enter. I'll type in, say, Kelly's name. Enter. + 1 6174951000. Enter. And voila, it ends up in the CSV using a little bit less code than we had to last time with C. Let's run it once more. And I'll type in my name. And I'll again use + 1 617495 1000. Enter. It's being appended to that file as well. And one last time for John. Plus 1 9494682750. Enter. Voila. So it's pretty easy. That is to say in Python to start creating files like this. But this isn't really Pythonic. Let me in fact close the CSV file, hide my terminal, and propose that we can tighten up this code a bit too. I don't need to open up the file way up here. I can go ahead and get my variables values uh this way first. And in fact, I could have done that code a little later anyway, but I can do this in Python. I can say with the following file opened, phone book.csv CSV in append mode and refer to it as a variable called file. Do this stuff and close the file yourself. So this program is suddenly significantly shorter because this one line has the effect of opening the file for me in append mode, assign it to a variable, do this stuff, and then as soon as the program's indentation ends and there's code over here or no code whatsoever, the file gets closed for me automatically. This just helps us avoid like memory leaks and like stupid mistakes we've made in C because you forget to close a file that you have to open and you don't necessarily notice unless you run valr or something on it. Python tries to avoid this by giving you a new keyword with that doesn't really make sense semantically except with the following file open and it will close the file for you. So that's two among the features that you sort of get with Python. The catch though is that this CSV is fairly simplistic. In particular, it's missing a header row that actually indicates what is in each of the columns. In fact, if I go ahead and run code of phonebook.csv, we'll see again that the file contains just one row for Kelly, for me, and for John. Whereas, ideally, it would look a little something more like this Google sheet version, which actually has at the very first row something say name and number, which then describes the data therein, after which are the three actual rows. Now, the simplest fix here, frankly, would probably be to just start with name, comma, number at the top of the file and then assume that my phonebook.py program is just going to append, append, append additional rows to the file containing the names and numbers respectively. I could have done that from the get-go. And in fact, that would be better than putting some code inside of phonebook.py PI that writes out that specific row because after all, if I'm writing running this program again and again, I don't want the header row to appear again and again and again unless I complicate the program a little bit to ensure that I only do that once. But assuming that I do go into phonebook.csv and from the get-go do have a file that contains name and number, we can actually start to improve upon the implementation of phonebook.py pi because we can take advantage of the fact that my dictionary can act that my writer can actually read that same header. In fact, let me put these files side by side here. And then in phone book.py, let's go ahead and transition away from using a writer to using a so-called dictionary writer or dict writer for short. Capital D, capital W. And then let me go ahead and specify one additional argument to this particular function, namely field names, which I know exists because I looked it up in the documentation. And the value of this argument is supposed to be a list of the fields that are presumed to exist in the CSV that we're about to write to. So I'm going to do quote unquote name, quote unquote number. Line's a bit long, so it's scrolling there. But if I scroll back to the left, we'll see that the line is otherwise unchanged. But when I go down now to write each respective row, notice that I don't have to rely on this list which just assumes somewhat naively that name will always be in the first column or column zero and number will always be in the second or column one. After all, if someone were to move that data around, at least in the spreadsheet using Excel or Google Sheets or something else, my code would end up being fairly fragile because at the moment it's just assuming blindly that name goes first followed by number. But once we have that header row in there and tell dict writer about it, we can actually now pass in not a list but an actual dictionary of key value pairs and let the dictionary writer figure out where in the file which column those values should go in. So inside of this dictionary, I'm going to have one key called name, the value of which is indeed the name the user typed in. The second key of which is going to be quote unquote number, the value of which is the number that the user typed in. And let me go back actually now and fix a typo from earlier. We're only asking the user for one number. So all this time I should have just requested one number aesthetically with my input function there. Now notice I have the file ready to go. Indeed name and number are there that matches the field names I've provided to my code and it matches the key value pairs that I'm subsequently passing to right row. So let's go ahead and give this a try. Let me go ahead and run again with this otherwise empty CSV file. Say for the header uh phonebook.py with uh Python of phonebook.py. Enter. I'm going to now go ahead and type in say the first name which was Kelly before plus 1 617495 1000 and watch what happens at top right. Kelly and her number end up in the file even though I didn't actually specify explicitly as with a list or numeric indices which value goes where. Let's run it once more and put in myself again. Plus 1 617495 1000. Enter. And there again I am. And lastly, just for good measure, let's go ahead and put John back in the file with plus one 949-468-2750, which if you still haven't called or texted, do feel free enter. And voila, in phonebook.csv, we have all of those same rows and code that's a little more resilient now against any changes we might subsequently make there, too. All right, how about now some final flourishes using some other features of Python that we did see a glimpse of some time ago, namely the ability to install libraries of our own choice. So, up until now in CS50.dev, we CS50 have pre-installed most of what you need, including back in week uh the earliest weeks of the class when we had that cows program that I wrote that was using a thirdparty library that I had installed into my code space in advance. Well, you can use a program called pip to install Python packages into your own code space and if using your own Mac and PC onto your own Macs and PCs as well if those libraries are freely available as open source online and in the repository from which the Python uh pit program actually draws. Let me go back to VS Code and let me go ahead and create a new program called cow.py. And with this program, I'm going to go ahead and import that library cows. And after that, I'm going to call cowsay.cow quote unquote say this is CS50 to have a cute little cow on the screen say exactly that. Now, in a previous lecture, I had pre-installed this library. But suppose I had forgotten to do so today. Let's see what other type of error we'll see on the screen. Well, let me go ahead and run Python of cow.py. Enter. And there's another one of those trace backs. This one's a little more straightforward than the name error and the value error we saw in the past. This is a literally module not found error. no module named cows. Well, this is where the pip command comes in. If something hasn't been pre-installed uh for you in cs50.dev or in the real world on whatever system you're using, you can use pip install cows and assuming you've spelled it correctly and assuming the library is publicly available, hitting enter will result in pip automatically downloading the latest version, installing it in this case into your code space and solving hopefully that problem. Let me clear my terminal window, run python of cow.py Pi again. Definitely cross my fingers. And there is the most adorable cow. And if we full screen the terminal, we'll see that he's indeed saying this is CS50. Now, that's just one of the things we can install with cows. I could also install libraries onto my own Mac and PC. In fact, in just a moment, I'm going to switch over to another computer here where I have a terminal window open on my own actual Mac. And I'm doing this because I'd like to play around with some speech uh some texttospech uh library functionality which you can't really do in cs50.dev because it's browserbased and when you run code in the cloud it's not going to pass the audio along to your speakers on your laptop or desktop. But if I'm running Python and my own code on my own computer, a Mac in this case, or a PC in someone else's case, I can install that kind of library, speech to text, and have my own code on my own computer, use my own speakers to verbalize some string quite like that. So, how can I go about doing this? Well, having read some documentation, I'm going to go ahead and install with pip a library called pi to text uh text to speech version 3. hitting enter goes and finds and downloads as needed the uh the library if it's not already installed and then brings me back to my terminal and I'm going to use an older school program here called Vim or vi to actually implement a cow program on this computer whereby I'm going to go ahead and write some code using this library without VS code but with just another text editor instead to do this at the very top of my file I'm going to import this library called Python texttospech so pyttsx3 for version three and then I'm going to use only three lines of code to synthesize some voice. I'm going to say a variable called engine. Set it equal to pi ttsx3.init because the documentation taught me that I need to initialize the library the first time I use it. I can then use this variable called engine to actually say something quite like scratch albeit verbally instead of pictorially like this is c-50 quote unquote. And then lastly I can use engine.run run and wait similar to some scratch block so that the whole expression is actually verbalized before my program actually quits. Now, the first time I run this, it might take a moment for the library indeed to initialize itself. But on my own Mac here, I'm going to run Python of cow.py. If we could raise the volume just a little bit, hopefully we'll not see but hear this cow's greeting. >> This is CS50. It was very much in a rush to say it, but after initializing for that long. And if we ran it again and again and added some optimizations, we could get it talking much more quickly than that. But we now have a version of the program that indeed verbalizes what string or stir it is that I've passed into it here. >> CS15. >> It's really in a rush to finish there. All right. But let's try one final flourish of another library that's fun to play around with, if only because it'll motivate some of the things you can now do in Python yourself. Let me go into VS Code in my code space because this one does not require my speakers. I'll close that first version of the cow and I'm going to go ahead and create a QR code generator after installing with pip uh a library called QR code which I read about online and now it's installed in my code space. I'm going to now go ahead and create a file called uh QR.py. So let's go ahead and code up QR.py and I want to generate my own QR codes. Most of you in the h are in the habit if you've ever generated a QR code before, you probably just Google around for some generator online for which someone else wrote code to generate the QR code. But I can do that for myself and actually generate my own images. I'm going to go ahead and import the library that I just installed. Import QR code. And then below that, I'm going to create a variable called for instance image and set that equal to this libraries QR code function. No relation to the make that we use for C. And I'm going to make a QR code containing a URL maybe of one of the lecture videos. So let's do httpsyoutube.com the short version and then xvfz j5 p g uh gg0 if I got that just right. Then after that I'm going to go ahead and call image.save to save that URL as a file called qr.png quote unquote. And then PNG will be the format which is portable network graphic which is akin to a JPEG or a GIF but with different features. I'm just going to double check my writing here. So we go to the right lecture video and I think we are indeed good. And what that should do after running my code is leave me with today's final flourish a ping file in my code space that when open is going to be QR code that you can scan with your phone. So if you'd like to get ready for this final flourish I'm going to go ahead and run Python of QR.PI and hit enter. Thankfully, it worked. I'm going to now open up qr.png and close my terminal window. And for our final moments together this here in week six, after which we'll ultimately transition to yet more languages and problems to be solved, here is a final code for you to scan of today's here lecture. All right, that's it for today. We'll see you next time. >> [applause] [music] [music] [music] [music] [music] [music] [music] >> All right. This is CS50 and this is already week seven wherein wherein we introduce another programming language this time known as structured query language or SQL or SQL for short. Now SQL as we'll see is a different sort of programming language that allows us to solve like a lot of the same kinds of problems that we've been dabbling with over the past several weeks but arguably in a lot of context it allows us to solve those problems more easily. Indeed, among the goals for today are to demonstrate that sometimes there's multiple tools that you can use to solve the same problem, whether it's C or Python or today's SQL. Um, but we'll also see that uh SQL allows us a different sort of approach to solving problems. Whereas C very much so and Python to a large extent are very much procedural programming languages whereby you have to write these procedures, functions step by step that tell the computer what to do including loops and conditionals and all of that. SQL is said to be a declarative programming language which is a different sort of paradigm whereby when you want to solve some problem you essentially declare what problem you want to solve or you declare what question you have and it's up to the programming language to figure out using loops and conditionals and all of those lower level building blocks how to get you the answer. So ultimately today is all about teaching you yet another language mostly so that you can learn again to teach yourself new languages and to appreciate that once you exit a class like CS50 and are out there in the real world really isn't all that big a deal to pick up new programming languages especially when in advance you've seen different programming paradigms like procedural like object-oriented like today declarative as well but today ultimately is also about data and so to get us started we thought we'd collect some real world data by asking all of you a couple of questions So, if on your laptop or phone you would like to pull up this URL here, it will also exists in just a moment in QR code form. So, if you'd like to go to that URL there or simply scan this here QR code with your phone, that's going to lead you to a Google form. For those unfamiliar, Google has lots of tools among which are uh is a tool via which you can ask people questions via forms. Microsoft has something similar as well. And at that URL, what you'll soon see is a form that looks a little something like this. Among those questions are which is your favorite language, at least among those we've studied thus far. So go ahead and anonymously answer the questions you see on this form. You'll see which is your favorite language and also which is your favorite problem in problem sets thus far. And meanwhile, as you might know, if you've used Google forms yourself to collect data, we can move from questions here to actual responses. And as people start to buzz in, we'll see that the data set here is starting to update in real time. And Google gives us these nice graphical user interfaces or guies via which we can analyze the data. And so far, Python is easily the winner with 70% plus of you preferring it. 11% of you uh wishing we were still in Scratch and N 18% of you in C. And you'll see the responses are coming in here. But for our purposes today, what's more interesting than the actual answers to these questions is how we can get at the raw data. So among the things you can do in Google Sheets is quite literally click view in sheets, which is in Google forms is click on view in sheets. And what this is going to allow me to do is access the underlying raw data. Now, because Google has forms and spreadsheets, they sort of tied these two products together. But what's especially nice about Google spreadsheets is that I can also download the raw data as a file. I can download it as an Excel file, a text file, a PDF. But for today, we're going to download it in a very common format known as CSV for commaepparated values. And indeed, if I go to the file menu, download commaepparated values. This is perhaps the most uh straightforward, easiest way to get raw data out of any kind of tabular data like this to load it into code that we are about to write. So, if you haven't buzzed in already, that's fine. But at this point in time, now that I've clicked the button, I now have a CSV file in my Mac downloads folder, which if I go ahead and open up here, I can see that indeed I've got this long named file, favor-form responses 1.csv. I'm going to shorten that file name to just favorites.csv. And what I'm going to go ahead and do is open up VS Code. And in my file explorer, I'm going to literally just drag and drop favorites.csv from my Mac. that's going to have the effect of uploading the file as it was at that moment in time so that we can now begin to write some code using this file. And VS Code has automatically gone ahead and opened it up for me. And what you're looking at here is what we're going to start to call a flat file database. It's a very lightweight database in the sense that it stores a lot of data. And it's a flat file in the sense that it's literally just a text file. And by convention, the way the data is stored in this file is indeed by separating values with commas. There are other conventions as well, but CSV is probably the de facto standard. But TSV is a thing for tab separated values, PSV, which is pipe separated values where you might have a vertical bar. Essentially, these file formats try to use a character that might not appear in the actual data so as to separate your rows and columns. So indeed, if I switch back to VS Code here and we take a look at the data, you'll see that from Google Sheets, I've been given three columns. Timestamp, which was automatically generated for me, the language, as well as the problem. And what I see here is that we had a few respondents buzz in a little early. Uh very excited for today's data. But here's the rest of them from like 1:30 p.m. Eastern onward. And you'll see separating separated via commas are effectively three columns of data. So everything before the first column represents a time stamp. Everything between the first and second comma represents the choice of language that you all buzzed in with. And then everything after the second comma represents the problem. Now it's kind of uh jagged edges. It doesn't line up in nice rows and columns because some answers are longer, some answers are shorter, but the commas are sufficient to tell the code we write where one column ends and the next one begins. So, how do we go about writing code like this? If we'd now like to ask some questions about the data, like what is the most popular language? What is the most popular problem? Or conversely, the least of each of those. Well, we could look at the original data in Google forms and that's where we got the pie chart. But how is Google figuring out what the most popular answers are and what uh pie charts it wants to depict? Well, they probably wrote some code not unlike what we're about to do. Although, we'll start with just a command line environment as always. So, within VS Code, I'm going to go ahead and do this. I'm going to go ahead and open up a program called favorites.py. And let's write a program whose purpose in life is to open the CSV file, read it top to bottom, left to right, and then crunch some numbers, figure out what the most popular answers are to those questions. So, I'm going to go ahead and import a package that comes with Python, a library called the CSV library. And nicely enough, this is just code that someone else wrote years ago that figures out how to read data from a file, separating it via comma, so that you and I don't have to write all of that ourselves. Then, I'm going to use this Pythonic convention with open quote unquote favorites.csv as file. Though, if I want to be super explicit that I intend only to read this file, which is the default, I'm going to go ahead and explicitly say quote unquote R, just like we did in C when using fop to open a file in read mode. And now I'm going to do this. I'm going to go ahead and say reader equals CSV.reader file. So, this is a Python convention whereby the CSV library comes with a function called reader that takes as its sole argument here a file that has already been opened. And what that reader will do is figure out where all of the commas are so that I can iterate over this reader in a loop and get back row after row after row without me having to write all of the code to figure out where those commas are. So what I'm going to do in this loop here uh in this uh block of code is for each row in that reader, let's go ahead and just print out maybe the second column which was the language column. So I'm going to go ahead and say print row bracket one because what we'll see is that this reader which again comes with Python hands me a list a list a list for each of the rows wherein bracket zero would represent the first column bracket one would represent the second bracket two would represent the third because everything is zero indexed in Python. All right so let's see what the effect is here let me maximize my terminal window run python of favorites.py Pi cross my finger that I got this right and voila there is every language that was selected by you all in the form from top to bottom by default chronologically but there's a bit of a bug I dare say let me scroll up and up and up in this output through all of these answers until I get to the very top where I ran the program myself which is here python of favorites.py Pi. There's a minor bug here. What's the bug in the output? Yeah, >> yeah, it accidentally includes the header, which is a bug in the sense that I really just wanted to see the languages, but the code is doing what I told it to, which is just print out every row. So, there's a few ways we could ignore this. Let me go ahead and minimize my terminal window and let me go ahead and say, well, you know what? after we create this reader, let's just skip to the next uh let's just skip to the next row and ignore it effectively and then begin iterating over everything thereafter. And so what happens now is if I remaximize my window, rerun python of favorites.py enter and now scroll up again to the beginning of this incarnation of the program. You'll see that the very first thing I see after my program was run was indeed Python, Python, Python, Python, and so forth. No more quote unquote language. So, how is that? Well, this is a a feature we haven't quite seen before or talked about in much detail, but this reader is is stateful in some sense. And this was actually true of all of the file IO we did in C whereby when you were using f read or some other function to read data from the file something was remembering where it was in the file so that you didn't get the same bites again and again and again. It was more like uh a cassette tape, an old school cassette tape if you will, or a scrubber along the bar uh along the bottom of like any streaming video whereby when you just read some data, it grabs the next chunk, the next chunk, the next chunk, the next chunk, and something inside of the computer's memory remembers where it is. So, this says skip to the next row. And thus, when you do four row in reader, you get everything but the first row because the reader is stateful. It remembers where it is in memory. All right. All right. Well, thus far this isn't all that useful because all I'm doing is just printing out the data. But let's take a step toward making this program a little more useful. In particular, let's just be a little more pedantic and specify that what I'm really doing here inside of this loop is figuring out what the current rows favorite is. So, I'm going to create a variable called favorite and set that equal to row bracket one. And then even though this doesn't change the functionality, I'm going to print that favorite just because semantically, stylistically, it's nice to know what row bracket one is as by defining a variable that tells me or anyone else who reads this code in the future what it's actually doing. All right, but readers are only so useful. And in fact, if I were to open up this CSV file, maybe in Microsoft Excel or Apple Numbers or Google Sheets, again, you could imagine someone kind of moving the data by just dragging one of the columns to the left or the right such that now it's no longer timestamp language problem. Maybe it's timestamp problem language or maybe time stamp is all the way over to the right. You could imagine therefore that the indices we're using 0 1 and two could be a little fragile because if someone changes the data on me now my code is just going to break because I am blindly assuming that the second column aka bracket 1 is going to be the language column but that might not be the case but there's an alternative to this and you might recall having seen this before. I'm going to go into favorites.py and tweak my code a little bit not just to use a reader but a dictionary reader. So I'm going to change this to dict reader instead of just reader. And then the upside of using a dictionary reader is that every time I go through this loop reading row by row by row, each row that I'm handed by this reader is not going to be a list anymore that's numerically indexed with zeros and ones and twos. Each row is going to be, as you might guess, a a dictionary, which is a collection of key value pairs, which means now we can use words as our indices instead of just numbers. Which is to say if I switch from reader which gives me lists to dict reader which gives me dictionaries I can change this line 10 now and say I specifically want the language column wherever it is all the way to the left or the middle or the right. So in general using a dictionary reader is probably just going to be more robust because it's resilient against changes in that actual numeric ordering. All right, let me pause here to see first if there's any questions on this exercise whose purpose in life is just to demonstrate how we can download the CSV data then iterate over it line by line without actually analyzing it yet. No. Okay. So let's ask maybe the most natural question which is like how many people prefer Python? How many people prefer C or Scratch in turn? In other words, how can we recreate in our own code what Google Forms is doing for us graphically with those pie charts? Well, I think what we could do is write some code logically that essentially relies on this mental model. What I have here is an opportunity to use a bunch of key value pairs because if I want to know how many instances of Python there are and C and Scratch, well, those might as well be three keys, the values of which are hopefully going to be three numbers that represent the counts of the popularity of each of those languages. So in memory, I essentially want to construct something that looks like this and would if I were doing this on a chalkboard. But recall that this mental model maps perfectly to the notion of a Python dictionary because a dictionary in Python is indeed key value pairs. And we've seen it already because that's how the dictionary reader works. But we could certainly use our own uh dictionaries to solve this same problem ourselves. So the goal at hand is to count the number of people who said Python and C and Scratch respectively. So how to do this? Well, I think what I could do is Oh, and actually let me delete this line. Because we are using a dictionary reader, we no longer need to skip the first row. It is automatically consumed by the dictionary reader for us. So, this now would be the better version of the dictionary reader. Let's go ahead and do this. Let me declare some variables first that will store for me the total number of people who said Python, Scratch, and C respectively. So, I could say Scratch equals 0, uh C equals Z, Python equals Z. And I could just set three variables equal to 0 0 0 and 0. If you haven't seen it before, there are some Pythonic uh tricks you can do here. If you've got three variables that you want to initialize all at once because it's that simple, you could alternatively do scratch, c, python equals 0, 0, 0. This too would have the intended effect and it looks a little better because it's all a simple oneliner. But what do I want to do now? Well, down here, let's go ahead and do a simple conditional before we enhance this by using an actual dictionary. Let me go ahead and say if the current favorite in that reader equals equals scratch. Well, let's go ahead and increment the scratch variable by doing plusals 1 as we saw last time. Uh, else if the favorite in the current row equals equals quote unquote C. Well, let's go ahead and then increment the C variable by one. uh else if the favorite equals equals Python, then let's go ahead and increment plus equals uh Python by one instead. I could technically get away with saying else here, but I'm consciously this time not trying to overoptimize this because if someone changes the form maybe next semester and whatnot and we're asking about a fourth language, I wouldn't want my code to assume that anything that isn't Scratch or C must be Python when there could be some future fourth language. So, this is a little more robust and in this case, we'll just ignore anything that isn't Scratch or C or Python. All right, at the end of this, let's go ahead and not just print out the favorite, but outside of the for loop, let's go ahead and print out, for instance, the Scratch count is this. Then, let's go ahead and print out the C count is this. And then let's print out the Python count is this. But, of course, there's a subtle bug here. Yeah. Ah, so I didn't format these things as f string. So I need the little f over here to the left of each of these strings. All right, so let me go ahead and maximize my terminal window, run Python of this version of favorites.py, and hopefully what we'll see is not every row again and again and again, but three lines of output, giving me the total counts instead. All right, this seems to line up with the rough percentages that we saw coming in earlier on Google Forms. 109 of you like Python, followed by 58 of you in C, and 24 of you preferring Scratch instead. All right, but why does this perhaps rub you the wrong way? I already alluded to the fact that we're going to get rid of this, but why is this not the best design just using three variables like this? Yeah, >> different categories. >> Yeah, exactly. If we were to add a bunch more languages, a fourth one, a fifth one, a sixth one, a 10th one, a 20th one, like having that many variables is just certainly going to look unwieldy and it's just not going to it shouldn't rub you the right way. At that point, we should really be graduating to some proper data structure, whether it was an array in C or better still in Python, an actual dictionary. So, let's do that instead. Let me go ahead and in a newer version of this file, let's get rid of these individual variables and let's just have a generic variable called counts, for instance, and set it equal to an empty dictionary. And just using two curly braces will give me an empty dictionary. Or if you want to be more pedantic, you can actually call the dict function, which will return to you an empty dictionary. I'd argue though that most people would probably just use the double curly braces like this to indicate that here comes a dictionary for me. Now, how do I use this? Well, I don't need to update three separate variables. I think I could just do something like this. I could say once I've determined what the current rows favorite value is for language, I could say counts bracket favorite. So, use the current string as an index into the dictionary. So, it's going to be quote unquote Scratch or C or Python. and then just increment that by one. And then down here, we don't have these variables anymore. So, I'm going to go ahead instead say uh how about this? We'll use a loop for each favorite in those counts. Let's go ahead and print out uh how about the favorite value and the counts thereof without any fing. Okay. So the only thing that's different is I'm using a dictionary here which is essentially the code version of this two column chart whose keys are going to be the favorite strings uh scratch or C or Python the values of which are going to be the actual counts and I'm just doing some simple math by plus+ing or incrementing the count each time I see a certain language. Unfortunately this code is not quite going to work. Let me go ahead and run Python of favorites.py Pi and dang it, there's a key error. Let me minimize the terminal window so we can see both at once. Why is there a key error apparently on line 11 wherein I'm indexing into the counts array uh dictionary? What's going on? Yeah, >> the key already exists. >> Yeah, it's a little subtle, but if this is like the very first time through the file, there is no key Python. There is no key C or scratch because no one has put them there. And yet recall that plus equal means you're going to that location in the dictionary and just blindly incrementing it. But what is it? Well, it's effectively a garbage value. But it's not even that because there's no actual key there. So we need to do a little bit of logic here. And we can solve this in a couple of ways. Well, I could say something very pedantically like this. I could just say, well, if this favorite is in the counts dictionary, this is the Pythonic way to ask that question. Is this key in this dictionary? If so, well, then it's safe to go ahead and increment it just as I've done before. But if it's not, what I think I want to do is set counts favorites equal to one instead because either I want to increment the current count by one or this is the first time logically I've seen this favorite so I want to set it equal to one instead. We could do this a different way logically just like we could in C solve problems differently. I could instead say something like this. I could get rid of all this code and just say if favorite not in count then I could say count bracket favorite equals zero. So just always initialize it to zero if it's not there. Now I can safely blindly update the count by one because now I know no matter what once I get to line 13 that count is actually there. All right, so let's see with this version of the code. Let's go ahead and clear my terminal window. Uh, rerun python of favorites.py. Cross my fingers. And there we go. Python and Scratch and C. Interestingly, the order switched around this time uh based on the order in which I was inserting things into the dictionary. But we'll see how we can exercise a bit more control over that. But let me propose that that key error. call. We discussed briefly last week that whenever you have these kinds of trace backs that refer to certain exceptions like exceptionally bad situations that can happen, you can also change your code to just try to do something and then try to catch the exception instead. So an alternative way to do what we initially did would be this. Instead of just blindly saying go into the counts dictionary, index into it at the favorite uh key and increment it by one, what we could do is try to do that. please, except if there is a key error, in which case, you know what, go ahead and just initialize that value to one instead. So, in short, there's like four different ways already to solve the same problem. Whichever way you prefer is quite reasonable. This is just another way and arguably another Pythonic way to do things by trying to do something but anticipating that something in fact can go wrong. A while ago you removed >> a while ago what >> you removed next reader. >> Correct. A while ago I removed next reader because that was only necessary for CSV reader because that was just reading every row again and again. But when you use a CSV dictionary reader that automatically consumes the first row because that's how the dictionary reader knows what the columns will be called and so you don't have to skip over it instead. A nice enhancement. other questions on what we've just done here. All right, so let me propose that like writing this amount of code is kind of annoying just to ask a relatively simple question like what's the most popular language in this file, right? You it's been nice. It's sort of a step backwards from Google spreadsheets and Apple numbers and Microsoft Excel where you could really just like highlight the column and it would just tell you the answer usually in the bottom righth hand corner or you could use a function in one of those spreadsheet tools to ask the same question. So, it's starting to feel like with almost a 20 lines of code, like maybe there's a better way. And I dare say there is. Rather than use a flat file database, let's graduate already to what the world calls a relational database. And a relational database is simply data in which you define relations among your data, which isn't so much relevant now except that that timestamp is associated with that language is associated with that uh prefer favorite uh problem as well. But we'll see that data sets can be much more uh much larger and more complicated. And it might be valuable if we can actually express relationships across multiple pieces of data. In particular, let's introduce already a programming language called structured query language or SQL for short, aka SQL. And SQL essentially only has four fundamental operations. So even though we're transitioning into a new language, by the end of today, we're going to transition out of the new language because there's only so much you can do. Now, as with any language, it's going to take time and practice or to sort of get a hold the hang of it. But take comfort in knowing that SQL really just supports four fundamental operations. And the acronym that the world uses is indeed CRUD, which stands for create, read, update, and delete. That is to say, when using a relational database, you can create data, read data, update the data, or delete data. And that's pretty comprehensive as to what's possible. Now, what is an actual database? Well, generally speaking, a database is just a piece of software that's running on a computer somewhere inside of which is stored a whole lot of data. And that database therefore provides you with access to that data at any time, whether it's on your local Mac or PC somewhere in the cloud or to a whole cluster of web servers, which we'll talk about in the weeks to come as we transition from uh command line tools to the web. Now, technically in SQL, the commands you actually use to implement this idea of creating data, reading data, updating, and deleting data is almost the same. But for whatever reason uh the world chose the command select which is equivalent to reading data. So we'll soon see that there's a command in SQL that lets us select data which is equivalent to this idea of reading it whereas the other three options refer of course to writing data that is changing data. Um technically speaking we'll be able to insert data into a database as we'll soon see and we'll also be able to drop data altogether not just delete individual rows but whole tables so to speak of uh rows instead. So what does this all mean? Well, let's go ahead and do say an example of using SQL to solve to ask some relatively simple questions and begin to develop some muscle memory for using this new language. If I were to manually load a bunch of data into a proper database for SQL, I would actually use code like this. I would literally type create table. Then I'd come up with the name of the table, aka sheet, and then I would specify every column that I want to put in that table. And here's where the vernacular changes. So whereas in the world of spreadsheets you have sheets, tabs that contain rows and columns, in the world of databases, you have tables which are just rows and columns. It's different terminology, but it refers to conceptually the same thing. In CS50, we're going to use a specific version of SQL known as SQL light, which is like a lightweight version of SQL that's actually very commonly used in web applications, in mobile applications, but it doesn't have all of the bells and whistles or all of the scalability uh that your Oracle, SQL Servers, Microsoft Access, Postgress, MySQL, those are just product names, open source and commercial like, which if you've ever heard of just represent uh bigger, faster versions of SQL databases. is, but we'll indeed use the lightweight version of it known as SQL light. And the command we're going to start to run is quite literally SQLite 3, which is version three of the same command, which we've pre-installed into your code spaces for you. So, let's go ahead and do this. Let me go ahead and run a command called SQLite 3, which is going to let me create my very first SQLite database, and I'm going to import into that database the CSV file that we downloaded from Google Forms. In other words, I'm going to load that same data set into a different program, an actual database, so that I can use a completely different programming language to ask questions about it instead of writing, as we just did, some Python code. So, let me go back into VS Code here. Let me close my CSV file and my Python file. Let me reopen my terminal window and let me go ahead and run SQLite 3 space and then the name I want to give to this database, which for instance will be favorites. DB for database uh by convention. Enter. I'm going to be prompted to make sure I want to create this new file. Y for yes. Enter. And now I'm inside of the database running a command at a prompt that's now says SQL light and then an angle bracket. I'm not going to be using anySSQL files for now. Although you can actually write SQL code in separate text files. I'm actually going to use the databases interactive interpreter to just run all of the commands I want interactively by just typing them out. Semicolon enter. type it out, semicolon, enter, back and forth. But you can save all of these commands as you'll see in problem set 7 in files as well. Now, how do I go about actually importing that CSV file into this lightweight database? Well, for this, I'm going to execute three commands. And any command in SQLite that starts with a dot is specific to SQL light, this lightweight version of SQL. Anything that doesn't start with a dot is generalizable and will work on most any SQL database anywhere in the world, no matter the product you're using. So, I'm going to go ahead and in my SQLite terminal, I'm going to change my mode to CSV mode just to tell the database that I want to load some CSV data. I'm going to then literally import that data from a file called favorites.csv, which is the file we downloaded earlier and then uploaded to my code. And now I have to specify the name of a table. So, I'm going to call this table aka sheet favorites just to keep everything consistent. And that's it. In the absence of an error message, everything probably worked fine. I'm going to do gotquit. That quits out of SQLite. But what you'll now see if I type ls is that not only do I have favorites.csv, which I uploaded, favorites.py, which we wrote a few minutes ago, but I also now have favorites. DB, which is a database version of that same file. Now, I can't actually see what's inside of it because if I go ahead and run uh code of favorites db, I'm going to see this file is not displayed in the text editor because it is either binary or uses an unsupported text encoding. This is to be expected because this database is stored essentially in the form of zeros and ones that the SQLite 3 program knows how to read, but is not something that VS Code can just show me everything therein. And generally storing data in binary is going to be more efficient than storing things purely textually because we're going to be able to use various data structures and algorithms that we've been talking about for weeks uh more easily on that binary data. All right, so let's go ahead now and see what this import command did. I'm going to again uh maximize my terminal window. I'm going to go ahead and run SQLite 3 again, passing in favorites.db. Enter. This time it already exists so it just opened it without prompting me. And now I'm going to go ahead and type another SQLite specific command called schema. The schema of a database is just the design of the database. What does it look like? What are the rows and columns and tables therein? So if I type dots schema, what I'm going to see is this SQL command create table if not exists quote unquote favorites which is the name of the table. Then in parenthesis there are going to be apparently three columns. One of which is called time stamp. The next of which is called language. The third of which is called problem. And each of those columns is going to be raw text. Now we'll soon see that it doesn't have to just be text. But when I use the import command, this is the default table that SQLite created for me. Soon we'll see that I can exercise more control, especially over the types of data that I'm putting in this database. But what's really nice about the import command is it could not be easier to convert a CSV file to a SQLite database. So that now as we're about to see we can use SQL on it instead of Python or any other language instead. Okay. So how do we go about getting data from this database? Well, the first of our commands that we'll explore is that one called select. So select data means to read data from the database. And in this sense, it's going to be a declarative language because I'm just going to declare what data I want to select from the database. And I'm not going to worry about opening the file anymore or iterating over it with a for loop or a while loop or defining variables or the like. I'm just going to select syntactically what I want. So let me go back to SQLite here. Let me clear my terminal just to get rid of the past commands. And let's do the first of these. Select star from favorites. And I regret to say uh the semicolon is back for the SQL code we're now writing. Enter. and we will see a sort of asy art version now. So even better than the raw CSV file of all of the data that was imported into this table. So select star from favorites is apparently selecting everything. So the star in this context is a wild card of sorts that represents all of the columns in the table. The table itself is called favorites. So I'm selecting all of the columns from the table called favorites. And here you have it with sort of simple ASKI art. first column, second column, third column, chronologically listed because that's exactly how it was loaded into the database. All right, so if star is wild card, what more can we do? Well, if you don't care about all of the columns, you can actually be a little more specific. So I could say instead, select just the language column from the favorites table, semicolon, enter. And now I have just a single column of data that shows me one cell for every submission but not the timestamp or the favorite problem that that person put in. Or if I want to declare that I want a couple of columns. So I can say select language and problem but I don't care about the timestamp from favorites as such and now you get two columns instead. So in short, rather than write the dozen or so lines of code that we earlier did with Python to open the file and then iterate over it with a reader, we just select what data we want from this here database. But even more powerfully, SQL comes with a whole bunch of functions built in. Quite like the spreadsheet software that you and I are already familiar with in the real world like Excel and numbers and Google Sheets. SQLite comes with an average function, account function, distinct lower, min, max, min, uppercase, and so forth. There's a whole list of them. We'll play around with just a couple of these. If we want to transform some of this data, let me go back into VS Code, clear my SQL light terminal, and suppose I just want to get the total number of rows in the favorites table, like how many people at the moment in time I downloaded the file, even if not everyone had quite buzzed in yet, did I end up with in that file? Well, I could say select the count of all of the rows from the favorites table semicolon. And now I'll get back a single cell which gives me 272 submissions had come in the moment I downloaded that file. Suppose I want to see just to confirm that no one submitted bogus data. Which languages were actually among those typed in? Well, I can select only the distinct languages that were typed in from the favorites table. And now I get a unique list of languages that everyone buzzed in with irrespective of how many times. If I want to maybe get um how many distinct languages there are, if it's not as obvious as three here, I could select the count of distinct languages from the favorites table and it would just tell me the answer. Three is the total number of languages that are distinct in that submission. So again, it's even easy to just eyeball this, but very quickly with single statements that are sort of English-like left to right is enabling me to just select the answers I want to some of these problems. Well, what more can SQL do? Well, here is a bunch of other uh keywords that we can add to our SQL commands that allow us to control further what kind of data we're going to get back. We're going to be able to group data by similar values. We're going to check for not just string equality, but for uh fuzzy matching, checking if something is close to a string that we're looking for. We can limit the total number of rows coming back. We can order or sort the data by a certain column. And we can actually have predicates, so to speak, using a wear, which is similar in spirit to an if condition, but a little more succinctly written instead. So, for instance, let me go back to VS Code here. Let me clear my terminal again, and let me go ahead and select how many of you answered C is your favorite language. Without selecting all of the counts again, let's just uh hit the nail on the head. So, let's select the count of rows from the favorites table where the language selected equals quote unquote C semicolon. And I get back a simple answer. 58 of you buzzed in with the answer C. How many of you liked both C and very specifically the problem called hello world? If you sort of that was the extent of your sort of um the passion for for code, let's go ahead and select the count of star from favorites where the language you typed in equals quote unquote C. Uh and the problem you typed in equals quote unquote hello, world semicolon. And it looks like five of you said your favorite language was C and your favorite program was hello world. [snorts] Great. All right, so it's getting a little more interesting. What about the other version of hello world where we called it hello, it's me. Well, that one's interesting because I think it's going to break my convention of using single quotes, which would be convention here in SQL. Whenever you're using a raw string, single quotes here would be the norm. But let's type this out. So, select count of star uh from favorites where language equals quote unquote C. And the problem this time equals quote unquote hello, it's me. So, at a glance, this is probably going to confuse SQLite 3 because does that middle apostrophe belong to the first one or the second one? This is ambiguous. And this is weird. In C, we would solve this problem by putting a backslash in front of it in a so-called escape character. Different languages have different conventions. This one's a little weird, but in SQLite, what you instead do is doubly single quote it. So putting two single quotes is the convention for escaping a single quote just because you got to remember or Google these kinds of things in the real world if you forget. Enter. Now I get back that. So not it was not the case that any of you liked both C and that problem specifically. Well, what if we want to be a little more inclusive of either hello problem? Well, I could do this in this way. Uh just like in my uh code spaces terminal, I can go up and down to go back through my history. Same thing in SQLite. So I can go back to commands to get up here and let me go ahead and write something longer where the problem is hello world or the problem equals quote unquote hello it's double apostrophe me single apostrophe semicolon oh and parenthesis. So it's wrapped onto two lines here. So, it's a little messy, but I'm just logically saying where you buzzed in with C as your language and a problem of hello world or a problem of hello, it's me. Enter. It should be the same answer as before because none of you liked hello, it's me. But I chose this syntax because I can actually make this a little cleaner. I can go and delete this whole parenthetical and just say where language equals C. And the problem is like quote unquote hello, percent sign, single quote semicolon. So this is a little weird too. It's just how SQL does this instead. But whereas previously I was using an equal sign to check for literal string equality like literally those problem names, like allows me to use wild cards. And it's not a wild card quite like the previous used of the asterisk that we saw. When you are using a wild card in a string in SQL, you say percent sign to represent zero or more characters there. So hello, space percent is going to hopefully match this or the other problem that started with hello, so let me go ahead now and hit enter. The answer is still going to be the same, but indeed it's demonstrative that that is how you could express yourself a little more generally if you wanted a pattern match like that. Questions now on any of these techniques? Yeah, >> capitalization capitaliz. >> Uh, good question. Does it have to be capitalized when doing string equality? Yes, but not with like. Like will tolerate case insensitivity. So uppercase or lower case, >> but like count and everything. >> Oh. Oh, I see. Good question. So the capitalization so stylistically in SQL I would argue and this is a stylistic convention in SQL certainly for CS50 and also for a lot of companies and communities in the world to uppercase your SQL keywords just to make them stand out from words that you and I chose as like the name of the table or the name of the columns therein. This is just a convention. I would propose like always to be consistent but for CS50 and for style50 sake I would propose that you indeed capitalize like this. And frankly, it just makes it easier to read to my eye because the SQL stuff jumps out and then the lowercase stuff is specific to your data set. A good question. All right. How about another uh set of keywords that we saw on the screen earlier, namely grouping by? Well, suppose we have a data set like this whereby we suppose we have a data set like this whereby how does this go? Happy Halloween. whereby here's just an excerpt from that table. So for as languages go uh say one of you liked C, two of you like or three of you liked Python and then now that we're introducing SQL, let's imagine that two of you now like SQL even better. So that's the extent of the data set. Wouldn't it be nice to be able to figure out how many of you like C or Python or SQL? Well, I could write some Python code, open the file, iterate over it using variables, using a dictionary, and those what 20 or so lines of code we wrote earlier to answer this question. Wouldn't it be nice to just ask the SQL language to figure out how many of you like C, how many of you like Python, how many of you like SQL? We can do this by grouping these cells by common values. Let's group all of the Python rows together and all of the SQL rows together. And even though there's just one, all of the C rows as well. So, how can we do this? Well, let me go back to VS Code here and clear my terminal. And let's do this. Let's select every language but its respective count as well from the favorites table. But before you do any of that, group everything by language. So this one takes a little more practice and getting used to, but this is simply saying select all of the it's saying look at the languages essentially group all of the common languages together and then figure out what count that gives you for all of the grouped rows. If I hit enter here, we'll get an answer just like the Python code that took me 20 lines of code to write earlier. What's really happening though in the database is something a little bit like this. Notice, of course, that there's only one version of C. There's then three versions of Python and there's two examples of SQL. And the table I'm essentially building is to group all of those by identical values and then spit out the total counts here. Now on the screen, it's just one, three, and two. in the data set with some 200 plus responses, we have much larger answers including scratch instead of SQL right here. But this now sort of speaks to just how much more convenient it is to if you want to ask a question like that, especially if the data set is more than a couple of hundred rows. If your boss for instance in the real world has a CSV data set and wants you to analyze the data, well, you can literally download it, import it into SQLite, run one command, and boom, like you've got this analysis done. if the extent of it is just to group the data and figure out uh what kinds of uh counts you have in the data set. All right, what else can we do? Well, we can play around with this a bit more. Let me go back here into VS Code and propose that we could uh order those results more than in just the uh the default way. So, let's go ahead and select the language uh and the count from the favorites table yet again. Let's group by language yet again, but this time let's order by the counts column in descending order. So, it's a bit more of a mouthful and it takes some practice to memorize all of the syntax, but when I hit enter now, I get back the same answers, but Python is at the very top of the list. Now, count star isn't necessarily all that self explanatory, and indeed, it's a little annoying that I have to write out count star here at top right as well as in the beginning. So, it turns out SQL also supports aliases. So if you want to change the temporary name of the column to be something else like n for number, well then I can actually define an alias with the keyword as order by n at the end of this statement and then hit enter and get back the same results too. And so if it's not sort of implicitly clear already, each of these SQL select commands is essentially giving me back a temporary table. This is not being saved anywhere. Like now it's gone from the computer's memory once I've actually gotten my answer. But it's essentially returning a subset of the tables that do exist in the computer's memory because that's what the import command did for me. It loaded the whole data set into memory. And now I have these temporary tables that are just containing the answers to questions I care about. And if you only care about the top one language, well, there's a limit keyword, too. I can literally just say limit one at the end of that exact same statement. Enter. And now I've got a single answer to my question. A single row saying Python was the most popular with 190 people selecting that. All right, for now I think that's enough on select. There's a few more keywords, but it really is just a matter of composing these building blocks. Questions though on these capabilities fundamentally. All right. Well, how about maybe inserting data instead? So here might be the canonical way to insert a row into a table in SQL. You literally say insert into then the name of the table then in parenthesis the one or more columns for which you have data and then literally the word values and then in another set of parenthesis a commaepparated list of the one or more values that you want to insert into those there columns. So for instance let me go back into VS code here. And of course at the time we circulated this form a few minutes ago we had not yet assigned problem set 7. But in problem set seven is a problem called 50ville, which let's propose might very well be someone's favorite in a week. So let's go ahead and insert that row now pro uh preemptively. Let's insert into the favorites table two columns, language and problem. Why? Well, I don't really care to figure out what the time stamp is and the format thereof. So I'm just going to omit the time stamp altogether. But the values I'm going to insert for this new row are going to be are going to be quote unquote SQL comma quote unquote uh 50 bill close quote close parenthesis semicolon enter. Nothing bad seems to have happened. Let me go ahead and select star from favorites just to see what my data set looks like now. And indeed at the bottom of the file or the bottom of the table indeed there is that new row. But what's sort of noteworthy is that this isn't just blank. There's our old friend null, which is not a null pointer. It's the same word literally, null l, and it refers explicitly to the absence of data. And this is actually a nice feature because if any of you have ever used like Google spreadsheets, Apple numbers, Microsoft Excel, and thought about uh or looked at cells that are blank, like what does it mean if a spreadsheet cell is blank? Does it mean like there's literally no data there? Does it mean that you just don't have the data there or it's missing in some form? Well, how do you address that? Well, maybe you put like n sl a in English for like not available or something like that, but that's kind of hackish. And if you use na, that might mean that no one can actually type na as their answer. And so what's nice about SQL and data and database languages more generally is that null signifies the conscious omission of data. It's not just a missing value. It's consciously not there. It's not just the empty string, quote unquote, for instance. So we might see different examples of that. But what's nice now is that I can distinguish null from other values. And in fact, if that is not a good idea to have any data in my data set that is null for whatever reason, like it just looks like bogus data, it would nice to know who inserted that when. No problem. We can also delete data from a table in SQL. And I can delete from the name of the table where some condition is true. So for instance, if I want to delete that, I can do this in a couple of ways, but perhaps the simplest is to delete from favorites where uh timestamp is null. Semicolon. So is 2 is another SQL keyword here. And that will go ahead and delete only those rows where the time stamp is null. Enter. Let's do the same select command as before. Enter. And voila, that row is now gone. Be very, very, very careful with delete statements. If I had foolishly done this, want to guess what the results would be? It would delete everything. And like you can Google around and see actual articles of like interns at companies who had way too much access to a company database executing something like delete from favorites because they forgot the predicate. They hit enter too soon. and boom, all of the data is now gone. So these are very destructive commands and just like in the real world, if you don't have backups or versions of these same tables, the data can indeed be lost forever. So don't do that. Always have your wear and make sure your wear is correct. All right. Well, let's go ahead maybe and um suppose let's claim that maybe 50ville is going to be a really popular problem among students. So much so that it becomes overnight everyone's favorite problem. Well, we can update the table as is. Here is the general syntax for updating rows in a table. You literally say update the name of the table, the word set, and then a bunch of key value pairs. The column that you want to update, setting it equal to the value that you want to update it to where some condition is true. So, what does this mean concretely? Well, let's say that we want to change everyone's favorite to SQL and 50ville. I could do this. update favorites set language equal to SQL comma problem equal to 50ville close quote semicolon and this is where again it can be dangerous but in this case I'm going to go ahead and hit enter without any predicate to filter this nothing bad seems to happen but if I now do select star from favorites semicolon all of you would seem to like 50 bill and there is no going back to the previous version of the table unless I quit out of this And I import the whole CSV again, maybe after deleting the data entirely. All right. So, how do I get rid of all of the data? Well, if you want to delete from favorites for real now, enter. Select star from favorites. We can confirm that that was a bad idea. There's literally no data in the database anymore, but we can certainly restore from our actual CSV. So in short, we've got select, we've got insert, we've got update, we've got delete, we've seen create, albeit automatically generated by SQLite 3. Maybe we'll see drop. And actually, we can see drop now. So recall that if I do dots schema, I can see all of the tables in this here database. If I do drop table favorites semicolon, and now again dot schema, now there is nothing in this database at all. So that's an even worse command to run unless you know and intend what you're doing. Questions then on these CRUD operations creating, reading, updating, deleting. Yeah, here first. >> Why do you not do quotation marks around null? So null is a special symbol and if you put quotation marks around it, you would literally be looking for the value null l that maybe was the name of a language or the name of a problem or something literally in the CSV. We are looking for the absence of that data altogether. Yeah. >> Really good question. Is it's so easy to destroy data like this. Are people actively backing up their data? Short answer, yes, absolutely. Like all of CS50's web apps and the like are automatically backed up on some schedule. Even then, we have to decide what that schedule is. And if it's daily, for instance, nightly, we could lose up to like 23 hours 59 minutes of data. In some case maybe companies would therefore version their data more tightly like every 5 minutes every minute although that's going to consume a lot more space but there already is this theme of trade-off certainly in computing um you can also implement forms of access control so SQLite is lightweight it has no notion of usernames or passwords if you have access to the data you can touch everything but in the real world with uh commercial and open source software like uh Oracle and SQL server and Postgress and MySQL you actually have usernames and passwords and specific permissions so you can give users in turns the ability to select data but not update or delete or insert data or any combination thereof. So there are defenses other questions on these here CRUD commands. Okay, let's go ahead and play with some real world data. So many of you might be familiar with IMDb, the internet movie database, which is a great repository of data for movies and also TV shows and actors and the like. And within IMDb's website, you can actually download uh TSV files, tab separated values of files that contain a lot of the data from that their website. So we went ahead and did this. We then converted that TSV data into a whole bunch of SQL tables so that we can begin to play with it uh in the context of TV shows. However, let's start first with a question about how you could go about modeling data for TV shows themselves. So for instance in advance I also uh created a few different spreadsheets that just allowed me to play with how I might model data real world data at that. So the office is a very popular uh TV show. The US version here is uh the US version here starred Steve Carell and others. So if I think about how IMDb or maybe just even little old me with a spreadsheet might keep track of who starred in what TV show. Well, I might just use a Google sheet like this and in the first column have a title column where this is the title of the show, like The Office. And then if it stars one person, I would put Steve Carell in the next column. But if there was a second star, I might put Rain Wilson or John or Jenna or BJ Novak here, column by column by column. And I could just keep adding show after show after show after show, one row per show, and then however many stars that are in there. What might you not like about the design of this data, though? or what might start to look odd. >> Yeah, it's a little weird that we have star star star. Just this repetition has tended to be bad. Anytime we're copying and pasting should rub you the wrong way. Other observations about it too? Yeah. >> Yeah. At the moment I've got 1 2 3 four five stars and there's certainly TV shows with fewer TV stars and more and so okay I can add more columns. I can just keep saying star, star, star, but then it's going to be a very ragged data set, very sparse data set where there's going to be a lot of blank cells for shows that have small casts, but then a lot of columns for shows that have large casts. So, it just feels like this should be rubbing you the wrong way. It just feels like it's going to get messy, especially as the number of stars, let alone shows, gets larger. All right. Well, another version of this uh data set that I put together is this instead. So, I didn't like the fact that I was going to have an arbitrary number of columns based on the specific show in question. So, here I scaled back and I just have a single column for title as before, but now a single column for star. And I decided that if a TV show has multiple stars, well, I just put each of the stars names and then to the left of them specify the show that they're in. seems to be a little better and that I've solved some of the redundancy problem, but I've kind of just kind of like covered up the hole in a leaky hose and now another leak sprung up here, which is to say there's still a bad design. What's bad here? Yeah, >> yeah, now I've got the office, the office, the office, the office, the office. And that too feels like I'm wasting space. If I manually type this in, odds are eventually I'm going to screw up and one of these is going to be misspelled, which is going to break something somehow. So, this two doesn't feel quite ideal. So the third and final version I whipped up to model this data which is going to lead us to a similar design in an actual database looks a little more arcane but is the right way at least academically to do things and we'll see technologically too this is going to be a big game. So here I now have a spreadsheet with three separate sheets. One is called shows which is selected at the moment. Another is called people which is not selected yet and the third of which is called stars. What am I doing here? Well, notice that in the show sheet, I've still got the title column, but I've decided to give the office a unique ID. Much like a Harvard student has a unique ID number, much like an employee in a company probably has a unique employee ID. Similarly, have I given the office a unique identifier that happens to be the same as it is in IMDb. Meanwhile, for all of the people that exist in the world of TV shows, for instance, these five folks, I have their names as well as unique IDs for them. and those integers are unique to the people and no connection per se to the show ids just yet. But the third and final sheet I've whipped up is going to be a sort of cross referencing sheet that allows me to associate shows with people. And at a glance, this looks the most arcane of the three because it's just numbers. It's just integers. But if you recall from a moment ago that the office's unique ID was 386676. Well, that's how we associated that show with this person which happens to be Steve Carell and so forth. Now, at a glance, not very useful to me, the human unless I do some fancy spreadsheet stuff like VLOOKUPs, a familiar, the like, but this is a stepping stone to how proper databases do actually store data. What I have done here is normalize the data by eliminating all redundancies except for maximally some redundant integers. And why is that? Well, integers, at least we know from our days in C, are going to be a finite length. It's going to be 32 bits, maybe 64 bits, but it's always going to be the same number of bits. And that's nice because anytime you have a fixed number of bits, it lends itself to storing things nicely in an array or doing binary search because everything is a predictable distance apart as opposed to strings like Steve Carell or John Krinski or the names might vary in length. These IDs for the title of the show and these IDs for the persons are not going to vary in length because they're all just integers. But of course, this spreadsheet now much less useful because if I want to figure out who is in the office, well, first I have to figure out what show this is, then I have to figure out what uh person this is and this is and this is but that's where SQL is again going to swoop in and allow us to solve this problem. And indeed SQL is one of the most common ways that web applications today, mobile applications today store any amount of data at scale. They are most likely not using simple CSV files. they are using SQL light or MySQL or Postgress or Oracle or other commercial and open source incarnations of SQL databases and odds are IMDb might be using the same as well. All right, so let's go ahead and do this. I have created in advance a file called shows db that contains hundreds of thousands of rows from TV shows and TV stars and other data from IMDb itself. And in a moment we'll see a database that if drawn as a picture looks a little something like this. There is going to be a people table. There's going to be a shows table. There's going to be a stars table that somehow links the two. There's also going to be a writer table and a ratings table and a genres table. So overnight this sort of escalated quickly from just favorites which was a single table to now a real world data set that has six tables. But here is the relational in relational databases as these arrows are meant to imply. Right now, there are relationships across these several tables. Case in point, here is people here. And we'll see in a moment that a person in the IMDb world has an ID number, a name, and a year of birth. A show in the IMDb world has a unique ID, a title, the year it debuted, and a total number of episode. But there's no mention of people and shows. There's no mention of shows and people. But per the arrows, there's going to be this third table here, stars, that somehow links show ids with person IDs. And this is where relational databases get really powerful because you can solve all of those redundancy concerns and actually enable yourself to select data much more quickly instead. But let's focus on something simple first. Let's focus just on the shows table, which pictorially might look a little something like this. So, in just a moment, I'm going to go ahead and reopen VS Code, and I'm going to open up instead of favorites. DB, I'm going to go ahead and open up uh a file called shows.db, which again, I arrived with in advance. So, if I open up with SQLite 3 shows db and hit enter, I'm back at a SQL prompt. Let me go ahead and type schema shows just to show you what command created this here table. And it got a little more interesting already. Notice that the table is called shows and it's got 1 2 3 four columns. The an ID for each show, a title for each show, the year it debuted for each show, and the number of episodes. There's also clearly some mention of types and some other keywords that we haven't yet talked about. But let's focus now first on just what the data is. The best way to wrap your mind around a new data set if someone hands you a SQL uh database or you've imported a CSV into a SQL database is just select some data. So select star from shows semicolon. That's a lot of data flying across the screen. It's not very easy to see because some of the show names are apparently crazy long and so it's wrapping, but it's still going and going and going. I'm going to hit control C to interrupt it. C as uh with our terminals in general is your friend. Let's run that same command, but just limit it to the first 10 shows. So, there are the first 10 shows in the IMDb database of TV shows. So, we've got 10 rows in this data set going back to it looks like the 1970s is roughly where their data set starts. All right. So here's the data we have in here. Well, how much is there? Well, let's go ahead and check. So, select count star from shows semicolon. And now we're talking. There's 250,87 shows in this database. And if I do the same for people, select count star from people semicolon. Looks like there are 74,315 TV stars associated with this year data set. So here too the data is much more interesting and much more representative of real world data. All right. How about the ratings? IMDb if unfamiliar is also a place where you could go to check the ratings from users as to whether something is good uh show a good show a bad show or anything in between. So let's do dots schema ratings and I'll see that yeah there's this table called ratings that as we saw briefly on the screen there's a show id and then a rating and then the total number of votes that contributed there too and again some data types and other syntax that we'll get to before long but let me go ahead and just do select star from ratings limit 10 just to get a sense of what the data is. That's now what the data looks like in that table. So to a human at a glance, not that useful because you don't know what those show ids are. But in a moment, we're going to see how we can reconstitute this data by linking these tables together by way of those ids and actually get answers to questions. So among other things, a SQL database or a relational database more generally supports onetoone relationships whereby a row in one table can map to a one row in another table. So it's this is in contrast to one to many for instance. So one one means one row over here somehow relates to one row over here. Again the relational in relational database. Uh how might we go about uh seeing this? Well first here's a tour of the data types that SQL light supports. Uh whereas in C we had a somewhat similar list and in Python that list went away at least with regard to explicit types in SQL we're back to when creating our tables explicitly stating what the types of those uh columns are. So you have integers, you have numeric, which is more of a catch-all for things like times and dates and other useful real world data. You have real numbers which are like floats with decimal points. You have text which we've seen already. And then you have blobs which is a great name which stands for binary large objects. You can actually store raw zeros and ones like files in the database. Generally that's frowned upon to store files. But there's certain times where you do want to store binary data and not pure text. That's it for SQL light. There are only these five types. in uh other commercial and open- source SQL databases like Oracle and MySQL and Postgress and the same names I keep rattling off, you have even more data types than these. So that's among the additional features you get by using other databases as well. There's a few keywords though that are worth noting in SQL. You can specifically say when creating a table that this column cannot be null. If you don't want timestamp for instance to ever allow for null values, you can literally specify when creating that table, this column cannot be null. And if I try to insert data into that table with a null value as by not providing a timestamp, the insertion will fail. And so here's where things are different from just writing Python code or certainly using a spreadsheet. You can actually have built-in defenses so that you and no one else messes up your data by inserting bogus or blank data accidentally. You can further say that things must be unique. So every element, every cell in a column must be unique to ensure that you can't accidentally put two things with the same ID. Two Harvard ids, two employee ids that are duplicates. You can avoid that all together. But more importantly, relational databases support these two concepts, primary keys and foreign keys. And this is where the magic really starts to happen. A primary key is the unique identifier for a table. It is the column of values that uniquely identify every row. So it's probably going to be the show ID, the person ID, the Harvard ID, the employee ID. Anytime you have a value, often numeric, often integral, that uniquely identifies rows, you simply call that a primary key. When that same ID appears in another table for cross referencing purposes, you refer to it instead as a foreign key because that same key is over there in another table, thus foreign. But they refer to one and the same things in the context of the table in which it's defined. It's primary. If it appears in some other table, it is now considered foreign. All right. So, how can we make use of this? Well, let me go ahead and propose that we execute a few SQL commands as follows. If I wanted to start asking questions about ratings, I could do something like this. Select star from ratings where the rating is maybe a good show. So, let's call it 6.0 or higher. But let's just limit this to the top 10 shows that meet that threshold. Enter. So here I now have a temporary table that gives me three columns from the ratings table. Show ID, which is a for the moment useless identifier because I don't know what show it corresponds to, but the rating value and the number of votes that contributed there too. Well, how might I actually get to the shows that are actually highly rated at 6.0 or higher? Well, I don't need to select star. If all I care about is these top 10, I can whittle this same command down to just selecting the ratings. And now or sorry uh sorry, not the ratings, I can whittle this uh this table down to just selecting the show ids. So this is the answer to the question. What are the top 10 TV shows whose ratings are 6.0 or higher? Well, from the table, these are the first 10 that come back. How do I now select the shows that correspond to these values? Here's where things can be done a few different ways. I could select everything I know from the shows table where the ID of the show is in the following set. I'm going to do a parenthesis and then just for readability, I'm going to hit enter. The dot dot dot and angle bracket just means I'm continuing my thought. It's not executing the command yet. What is the query I now want to run? Well, it's going to be a nested query. I can now do the same thing as before. Select the show id from the ratings table where the rating is really good greater than or equal to 6.0. But let's then limit the total number of queries to just 10. So here just like in sort of grade school math we have parenthesis. So the first thing that's going to be executed is the thing inside parenthesis. So this is going to get me every show ID from the ratings table that has a really good rating of 6.0 or higher. That's going to return to me a column of values. I'm then going to say select star from the shows table where the ID of the show is in that list of values but only show me 10 of those is what I'm asking here. So what I should now see is much more useful data namely the 10 shows that are highly rated. Enter. And indeed I get back these 10 shows all of whose ratings are indeed quite a bit higher. If I want to only care about the title that too I can do. So let's do this again. Instead of selecting star, let's select title from shows where the ID of the show is in the following parenthetical. Select show ID from ratings where the rating is greater than or equal to 6.0. Close my parenthesis. Limit to 10. Enter. And I see the exact same thing, but just the nail being hit on the head. Just give me the titles of those top several shows. Of course, I might want to might be able to do this differently. In other words, here's the top 10 titles. Well, what are the ratings? Like, that's why you go to IMDb or Rotten Tomatoes or the like. You want to see the actual ratings, not the titles or the ratings. Well, it turns out we're going to need another technique to do that. Namely, an ability to join two tables. And in fact, just as a teaser for this, if we want to start playing around with some real data, here might be, for instance, excerpts from two tables. Here's the shows table at left. Here's the ratings table at right or a subset thereof. If I want to figure out what the rating is for a given show, wouldn't it be nice if I could somehow like line these two tables up together such that just like the tips of my finger, I line up this value with its corresponding value over here, a cross reference of sorts. Well, just for the sake of discussion, let me just kind of visually flip this around. Though that does nothing technically underneath the hood. Let me just scooch them together now after highlighting the common values. demonstrate that. Well, wouldn't it be nice to take the shows table and join it with the ratings table in such a way that those IDs all line up? And we're going to have the ability to do just this. Um, this is a lot already, and this isn't the sort of cliffhanger I'd wanted to end on cuz who cares about joins, but it's going to be cool. But let's take our 10-minute Halloween candy break and come back in 10 for the next. All right, we are back. So, recall where we left off was essentially here. We had these two tables. the shows table at left and the ratings table at right. And the motivation here was like how do we actually associate shows with their respective ratings because the ratings of course are not in the shows table. As an aside they could be and in fact because this is meant to demonstrate a onetoone relationship whereby every show has one rating. We could have just put the rating and the number of votes into the shows table but we chose not to because uh IMDb actually stores their ratings as a separate TSV file. And so what we tried to do for par with that is only import into a ratings table the very TSV file that we had downloaded from them. But that too would be a solution there too. So at this point in the story we've got the shows table here. We've got the ratings table over here. We've noticed that there are commonalities. There are show ids that appear in both tables. And in fact to use some of the new vernacular this is the primary key. The ID column here. This is that same value but in this context it's known as a foreign key because it's in some other table. But that's going to be how we link these two things together. So, how do we select for not just The Office, but maybe every TV show its respective rating? Well, let's go back to VS Code and at my SQL light prompt, let me go ahead and do this. Select star from the shows table. But let's go ahead and join the shows table with the ratings table. How do I want to join these two tables together? We'll do so on the shows tables ID column being equal to the ratings tables show id column and then go ahead and filter the results in the following way where the rating we care about should still be greater than or equal to 6.0 and let's only limit this to the top 10 results. So, it's a bit more of a mouthful, but what I'm doing is selecting everything from the result of joining shows and ratings on this column with this column. And the rest of the predicate is as before. So, join is going to do literally that join these two tables as I have prescribed. When I go ahead here and hit enter, now that I have my semicolon, I get back a complete table containing everything from the shows table, everything from the ratings table with those unique identifiers lined up. Indeed, if you look at the primary key over here, the ID column, 62614 dot dot dot. Over here, you have show ID, which came from the ratings table, 62614 dot dot dot. So, we've taken two tables and really joined them together, but we're only seeing a subset because I limited it to 10 such rows. Now, of course, most of this data doesn't seem very interesting if my whole goal is just to tell me what the ratings are for these shows. Well, let's go ahead and in code achieve this sort of result. Let's literally join these tables together. Let's get rid of the redundancy all together. And then really, let's whittle it down to just a title column and a rating column. So, how do we do that? Well, in code, I'm going to go ahead and select more specifically the title of every show and the rating of every show from the shows table, but I'm going to join it with the ratings table on shows doid equaling ratings.show id. And as before, I'm going to limit it to where rating is greater than or equal to 6.0 and 10 such results. Enter. And now I have a nice simple temporary table that in one column has the titles of these shows and in the right hand side has the ratings of the shows. Even though those two data sets were completely separate in two separate tables. Indeed, if we think back to where this data came from, what we've been focusing on is the shows table and we've joined it with the ratings table. Here's the primary key for shows. Here's the foreign key for ratings. And by convention, notice that we've adopted a certain uh a certain approach. Anything that's called ID here implies that it's a primary key. Anything that's something underscore ID implies that it's a foreign key. And the convention we adopted which is actually quite common is if the table is called shows plural, we call the foreign key show singular ID. Different companies, different communities will have different practices, but we've been consistent across all of these tables with our underscore and lowercase conventions. Yeah. I'm just curious on how these IDs all generate and relate to each other properly. >> Really good question. How do all these IDs generate and relate to each other properly? Well, in our case, I have no idea. The Internet Movie Database people came up with these unique identifiers somehow and we simply in incorporated them into our data set. In practice, what they probably did and what you will do for instance in future problem sets when generating data is you just assign an arbitrary integer starting at one then two then three then four then five and you just let it auto increment all the way up and you let the database ensure that you never have duplicate values. >> Yeah. >> Just to clarify for the dot dot dot and arrow symbol that's only to like make it look better, right? like there's no like >> correct the dot dot dot in uh uh angled bracket that you keep seeing is just the continuation prompt which means I have prematurely hit enter deliberately because I want to move everything onto the next line so it doesn't wrap ugly onto multiple lines it is not SQL syntax it's specific to SQL light 3 and it's just a continuation of the thought that's all good good observation yeah >> when you limit it to 10 showing how Good question. When you limit something to 10, for instance, which ones do you get? You just get literally the first 10 rows from the table. And so it will typically be ordered if you don't use the order by uh keywords uh in the same order from which it came from those tables. And so you're just seeing arbitrarily the first 10 that match that predicate, which is rating greater than or equal to six. We have not ordered it by rating. So I'm not getting like the 10.0 shows necessarily. I'm just getting the first 10 shows that are greater than six. And the point for that is just I want it to fit on the screen rather than see hundreds of thousands of answers. Okay. So you might recall now that there were certainly other tables besides these. So let's see in the broader scheme, not just shows and ratings, but let's focus on genres. If only because genres is interesting because it's no longer a onetoone relationship because of course why would a show have multiple ratings. It sort of has its own rating. But a show could certainly belong to multiple genres. You could imagine a show being a comedy and a drama or a musical and a comedy or any other number of combinations of one or more genres. And so the way we've chosen to implement that here too is with a separate table called genres which is not perfect. There's going to be some redundancies here that we have not yet eliminated. But it does indicate that we can go ahead and have multiple such values associated with each and every show. So how do we get there? Let's focus just on this. Let's go back in just a moment to VS Code and let's take a look at the schema for now genres. In genres, we have the following. A table called genres which got has two columns. A show ID which is an integer that cannot be null and a genre which is text which is also not be null. And now for the first time, let's actually use some of the vernacular we've introduced. Here we have an example explicitly in SQL that specifies when creating this table that it shall the show id column shall be a foreign key that references the shows tables ID column. And admittedly I think the syntax for creating tables is a bit of a mouthful even. I often have to read uh to look it up to remember the order of everything. But here we have the columns listed first and then these key constraints. Foreign key referencing this primary key over here. And in fact, let's rewind to look at the shows table now to see from which uh from whence we came. So if I do do schema of shows, which we've done before, but waved our hand at it, then we'll indeed see that shows has a primary key called ID, which is an integer. How do I know that? Because the very last thing in the parenthesis says that the ID column in this table is a primary key. Then we see that uh the title is text can't be null. The year is numeric, which again I described as sort of a catchall for other real world numeric types that aren't purely integers or uh real numbers per se. Episodes is an integer. Both of those apparently can be null because maybe IMDb just doesn't have that data for some older shows, but primary key is indeed specified here. And just for thoroughess, let me distinguish now genres from ratings. If I do schema ratings again, which we waved our hand at earlier, very similar in spirit to genres in that there's an ID column that somehow references the shows table and then some other column here, genre. In this case, we had ratings and votes, which were reals and integers respectively. But notice this one additional constraint here. I deliberately specified that show ID in the ratings table must be unique. That is to say, you cannot have the same show ID more than once in the ratings table. Why? Because I indeed wanted a onetoone relationship. And it would not be one one if there were multiple show ids that correspond to one uh ID in the shows table itself. But genres, we're going to allow that it's uh can be duplicates. And so we don't have mention of unique there. All right. So where does this get us? Well, let me go back into uh my terminal here after clearing all of that. And let's go ahead and just see the data to wrap our mind around it a little more uh real. So select star from genres limit 10 just to see the the first 10. All right. So it looks like there's some comedies, adventures, comedies, family, action, sci-fi, and so forth. Well, let's go ahead and look up just one show's information. In fact, I saw this number, this ID before. How about let's just look up this show. What is this adventure show? Uh 63881. So select star from shows where ID equals 63881 semicolon. Okay. So this is the show called Catweel from 1970 which had 26 episodes in total and that was indeed its unique identifier. So that's all fine and good if I want to see something about that specific show. But as before, how do I associate Cat Weasel in this case with all of its genres? Well, instead of it being a onetoone relationship necessarily, maybe Cat Weasel is not just an adventure. Maybe it's also a comedy and a family show. And indeed, if I go back to the results just now, you'll see that 68111 indeed lines up with adventure, comedy, and family. And then the ID changes to be about some other show. So, how do I select these three answers to the question, what genre is Cat Weasel? Well, for this, we need to talk about one to many relationships and how we can get those back. Well, let's go ahead and do this now in my terminal. Let me go ahead and say uh the following. Select genre from the genres table where the show ID equals just that 63881, which I'm now starting to memorize, adventure, comedy, and family. So, that's the answer to the question, but this certainly isn't the best way to do this where you have to like look up the unique ID for the show you care about, then copy paste it or memorize and type it out into this query just to get the genres. It would be nice to just ask all of this in one breath. Well, we can do this even though it's a bit more verbose. I'm going to instead this time say select genre from genres where the show id I care about equals and now I'm just going to hit enter so as to move this nested query inside of parenthesis and I'm going to say well I don't know off the top of my head what the unique ID is for catw weasel but I can ask the database select the ID from the shows table where the title of the show equals cat weasel and this now obviates the need for me to memorize or copy paste that unique ID I'll hit enter and close my parenthesis. Uh, I'm going to go ahead then and say uh, semicolon enter. And now I get back the exact same answers, but without having to know or care about these numeric values. And that's kind of the point here. Even though the database itself, the actual IMDb website needs to use these unique identifiers to store everything in the database, we humans, generally speaking, should not know or care what these identifiers are. They're just meant to implement this notion of relationships, these cross references. And so here we see an example where you can ask the question you care about without worrying about any of the underlying numbers or even seeing them as a result. All right. Well, what's really how else might we go about do doing this? Well, let me propose that we join these two tables and ask the question in a slightly different way. So, here's an excerpt from the shows table. Here's an excerpt from the genres table. And clearly we could do something like we did before for ratings where we could line these two up and kind of join them together. Just for the sake of discussion, let me flip these columns around though that has no technical significance. And now we can clearly see 63881 appears there and here. The difference though because now this is a one to many relationship is that it's not quite as simple as just joining the rows together. I need to kind of join it here and here and here. And the database can do this for you albeit at some cost in redundancy. So what I'm going to observe is that these ids are all the same. Primary key in this context, foreign key in this context. Well, I'm going to start to join them together here, but it's not possible to return a temporary table that's just outright missing data. You have to get the same number of rows and columns everywhere in a grid. So what the database is going to do if I do join these two tables together and they are participating in a one to many relationship with each other, it's going to duplicate the data that's necessary to sort of make every row look the same. Downside is it might indeed be taking up some additional space unless the database is smart and somehow using pointers or something like that underneath the hood to avoid the redundancy. But for my purposes, this is actually quite nice because if I iterate over these rows, as I could in Python, as we'll eventually see, it's just nice to have all the data you care about in each and every row, even though it's clearly redundant. But the data is not being stored redundantly in the data. It's just temporarily being presented to me with this here, redundancy. So, what do I really want to have happen? Well, I really care about actually joining these two tables together and ultimately just getting back the title and the genre respectively. So, let me go ahead and my VS code here and do select title and genre from the shows table. But let's join it this time on the genres table on shows ID equaling genres.show id. So that's quite the same as with ratings where uh the ID equals just for time sake 63881 which I know is Catweasel but I could certainly use a nested query if I wanted to do this as before. Enter. And I get back Catweel's three genres. And if I were to loop over this data in some kind of like Python code, I would have access to the title and genre with each iteration, which I claim is useful. But if I don't care about that and I just really want to select the genres, I can do this with joins too. Let me just select the genre from shows joining it on genres on shows ID equaling genres. ID where the ID is catw weasel 63881. And now I get back just that answer. So in short, what have we just seen? One, you can join two tables together and whittle down the temporary table to just the data you care about. Or if you prefer, and if I scroll back up in my history here, you could take a fundamentally different approach but still get the same answer of simply using a nested query. I would say as you learn SQL for the first time, I think it's quite often easier to just do multiple nested queries because you sort of work your way uh from the inside out, taking sort of baby steps to the problem. If the problem in question is give me all of the genres for a specific TV show, well, first I need to know because I know how the data is laid out in the database. I need to know the unique ID of the show I care about. Fine, that's pretty straightforward and hence this inner query. Once you have that, you can parenthesize it and on the outside now you can select the question to which you really want the answer, which is what is the genre that lines up with that show ID one or more times. So in short, nested queries probably easier and certainly when learning it for the first time, but quite powerful are these join queries where this achieves the exact same result. Especially if I were to generalize away the 63881 and do a nested query here. Sometimes you want join, sometimes nested queries suffice. >> How does SQL do all these searches? >> Oh my goodness. How does SQL do all of these searches? What's its time complexity? We'll talk about that toward the end of today. In the most naive implementation, SQL is essentially just doing linear search from the top of the table all the way to the bottom. However, we as the programmers are going to have the ability to optimize those queries so that the database can actually do something closer to binary search and in general we'll be able to achieve much better performance as a result. A really good question. All right, let's go back to the big uh flowchart of this data set. We've looked now at shows and ratings. We've looked at shows and genres. Let's now focus on the juiciest part like the part that associates shows with people. That is who stars in what. Thinking back now to what I was mocking up in the Google sheet at the very start whereby I wanted to somehow be able to associate the office with Steve Carell and John Krinski and Jenna Fischer and so forth. The right way and the right way I claim is going to be like this. Here's my people table which has a primary key of ID and then the name of each person and their birth year if known. Then we have the shows table which we keep talking about which again has a primary key, a title and year and episodes thereof. And then the stars table is somewhat new now because now when it comes to people starring in TV shows we have a third and final type of relationship, a many to many relationship. Why? Because it's certainly the case that one person can be in multiple shows. And it's certainly the case that some shows have multiple people hence many to many. So this is the third and final relationship where just to recap ratings was one one genres was one to many and now stars is going to be many to many. All right let's dive in. So these queries will be a bit more verbose but again they're going to follow this principle of sort of taking baby steps to the answer we care about. Let me go back into VS Code here and suppose I want to find out everything about the office that we know. So, select star from shows where title equals quote unquote the office semicolon. Well, that's interesting. There's a whole bunch of offices. There was the UK version. There's a few other variants, but the one we're probably talking about with these stars is the one that started in 2005 with 188 episodes. That's the US version in fact. So, let me be a little more precise. Let me select everything I know from the stars from the shows table where the title equals office and year equals 2005. so we don't confuse our answers with the other versions of the office. Now, how do I go about selecting all of the people who starred in that version of The Office? Well, I already have an answer to the question of what is the ID of that version of The Office because it's right there in front of me. And in fact, I can narrow my query more precisely. Let's just select the ID from the shows table where the title is the office and the year is 2005. 386676. Now, I could lazily just copy paste that or memorize it, but we're going to do this query more dynamically. I want to next though figure out who is in that show. So, if I have a show ID, I want to figure out who's in it. But how do I get to the people and the names of those people? I have to logically go through this cross referencing of the stars table. So, here's where this query is going to be a bit meteor than the past ones and that we need to do a bit more work than before. All right. Well, what's the work I need to do? Let me go ahead now and do the following. Select all of the person IDs that are associated with this show id. So, how do I do that? Select person ID from the stars table where the show ID equals and I could lazily copy paste this, but let's avoid that. Where the show ID equals, let me now in parenthesis do this. select ID from shows where title equals quote unquote the office and year equals 2005 and then close my parenthesis semicolon. So what am I doing? I'm taking a second baby step if you will. The innermost query inside the parenthesis is just again dynamically figuring out the unique ID of the office I care about. The outer query is now figuring out all of the person IDs associated with that show as per the stars table. And the stars table has only two columns. Show id and person ID. That's how the linkage is done just with those integers. Enter. I now have a column of person IDs that are starring in that version of the office. So how do I take this one final step if I really want to care about their names and not their random person IDs? Well, I could go ahead and select the name from the people table where that person's ID is in the following set. So when I'm dealing with a single value, I just use equals for equality. But when I'm dealing with a whole result set, a whole column of answers, I use the preposition in in SQL instead. So where the person's ID is in the following data set. Well, let's do the same query as before. Select all the person IDs from the stars table where the show ID I care about equals because there's only one show I care about. I'm going to further parenthesize this. Select ID from shows where title equals quote unquote the office and year equals 2005. Uh, enter. I'll close my parenthesis. Enter. I'll close my parenthesis. Semicolon. And now from the outside in, I've taken three baby steps. The innermost one just gets me the show ID. The second one in the middle gets me all of the related person IDs. And the last one is really the final flourish. Get me all of the names of these people based on those IDs. Enter. And now we see all of the stars in this show beyond even the subset that we've been playing with visually on the screen. Okay, that's a lot. Let me pause here and see if there's any questions. Yeah, >> this outermost query is what gives me the names. But that query needs to know the ID of the person who name whose name you want. So the middle query actually gets all of those person IDs. But to get those person IDs, I need to know the show id. So the innermost query, this one gets me the show ID of the office itself. All right. So at the risk of overwhelming, here are other ways you can solve the same problem. But I do claim that the nested selects is probably conceptually and pragmatically the easiest way. But let's also solve this problem by doing a few joins just so you've seen it. Actually, before we uh do a join, let's let's flip the question around first. How about all of the shows that Steve Carell has starred in besides The Office? So, let me select everything I know from the people table where the name of the person equals quote unquote Steve Carell semicolon. All right, there seems to be only one Steve Carell in IMDb born in 1962. That's all nice and good. What I really care about is his ID. So, I'm going to uh narrow this down to selecting just his ID. Now, I could memorize or copy paste 136797, but don't need to do that. Let's just use this as part of a nested query. Let's now select all of the show ids from the stars table that are somehow related to Steve Carell's person ID. So where person ID equals and I could copy paste this but that's generally frowned upon. So let's not do that. Let's just set it equal to a nested query where I do the same thing as before. Select ID from people where name equals Steve Carell. Then close my parenthesis semicolon. All right. He's been in a lot of TV shows, but this is not useful because I have no idea what all of these integers are. So, the final flourish, select the title from the shows table where the ID of the shows I care about is somehow in this parenthetical list. Well, what's that parenthetical list? Well, select the show ID from stars where the person ID equals Steve Carell's. What is his ID? Well, I didn't memorize it. So, I'm going to select ID from people where the name of the person I care about is Steve Carell, quote unquote. Close these par this parenthesis. Close this parenthesis. Semicolon. Enter. And now I see all of Steve Carell shows. And even though we're doing this in a black and white command line environment, think about what the actual IMDb is doing with both of these queries. If you go to IMDb.com and search for Steve Carell, even though there's going to be a lot of colors and pretty pictures and whatnot, you'll probably get in some form a list of all of Steve Carell shows. Or if you search for The Office, you'll get a list in some form of all of the stars there in. I could claim then that if imdb.com is using SQL, which it very likely is, but not necessarily, they are executing queries just like we did. And when you type into the search box something like the office or Steve Carell, they're essentially just copy pasting your user input into a prefabbed SQL query that they wrote in advance so as to get you the answers that you actually care about. So this is how a lot of today's websites and mobile apps are actually working. The programmer comes up with sort of the template for the queries you might ask and then you supply the actual data you're searching for. All right, how about now as promised a couple of other ways to implement these many to many relationships uh based queries but by using joins. If I know I need to involve the shows table, the people table and the stars table, I can actually do this all in one breath without any nested queries. Select for me the title from the shows table. But let's join that on the stars table on shows do ID equaling stars dot show id. Uh but let's additionally join the shows table on the following. Let's join it on people on stars.person id equaling people id. In other words, if you know conceptually that you've got these three tables, you want to somehow combine them without using nested selects. just figure out how to line them all up. So again, I'm selecting from the shows table, but I'm joining it with the stars table by lining up the shows tables primary key with the stars tables foreign key. And I'm lining it up with the people table by lining up the stars tables foreign key with the people tables primary key. I'm just kind of logically connecting all of the things I know to be related. And lastly, let's just say where the name I care about equals quote unquote Steve Carell semicolon. It's a little slower for now. And this speaks to the question that was asked earlier. How is the database doing this? Well, slowly, apparently by default, unless we optimize it, I got back essentially the same results. Although there is some duplication as a result uh which alludes to the um filling in blank of blanks that I alluded to earlier. But let me show you one other technique too. But again, I would encourage you certainly for problem set seven to focus on nested queries when you can because they're a little conceptually simpler. If I care about the titles of those shows, I could select title from the shows table and the stars table and the people table all at once in one breath. But I want to do so where the shows tables primary key equals the stars tables foreign key. uh and the people tables primary key equals the stars tables foreign key and the name I care about is Steve Carell. In other words, this is just a third way to express the exact same idea by doing implicit joins by selecting data clearly from all three tables as per this commaepparated list of table names, but telling the database with your predicate, the wear clause, how you want to line all of those tables up. If I hit enter here, cross my fingers, I should get back the same results as well, albeit with duplication, which I didn't see in the nested queries. Okay, that too was a mouthful. Let me pause here for questions. Yeah, >> to do that, >> correct? In order to do this, you as the programmer must know the internal structure of the database, which is quite often the case, whether you created the database yourself or you work with a colleague who designed the schema for the database. That said, I think your question is hinting at sort of the challenge like I really need to know the underlying implementation details when really all I care about is the answers to my questions. In code quite oftenly nowadays um there are object relational mappings whereby you can use OMS for short whereby you can use libraries that they understand the underlying database schema. You as the programmer do not need to because it figures out how to do all of the joins for you. So for CS50 we're introducing everyone to the bottom up understanding of how these joins work. But that too can be easily automated because of those schemas. Yeah. Just notice when you're typing across you indent is indentation important in SQL. >> Good question. Is indentation in SQL important? Technically no. But like with any of the languages we've talked about thus far, it is good for the humans and certainly good for the students in a context like this. Python of the languages we looked at is the most rigorous whereby indentation very much matters and the consistency thereof. SQL I'm just trying to pretty print things to make it easy to gro visually. All right. So those last two queries were arguably kind of slow. Whereas with my nested queries, I actually got lucky and just boom, I got the answer quite quickly. Those joins seem to be a step backwards and that it was taking more time to get back the same data that I actually cared about. But that's something we can actually chip away at. It turns out that one of the other values of a relational database visa v something like a spreadsheet is that you can actually tell the database in advance how to optimize for certain queries. This is not the case for spreadsheets. If you have a lot of data in Google spreadsheets or Microsoft Excel or Apple Numbers, tens of thousands of rows, hundreds of thousands of rows, millions of rows, your computer's going to slow to a crawl. And at some point, those software packages are just going to say, "Sorry, file is too big." And they're certainly not going to be terribly fast at searching the data. But with a SQL database and relational databases more generally, you are as much the architect of it as you are the user of it in this case. And so you can tell the database in advance if you want to optimize for certain queries like select statements. So for instance, let me go back to VS Code here and just for the sake of discussion, let's time how long it takes to find all of the shows whose name is the office. I'm going to use a SQLite command called timer. And I'm going to set it to on. And this is just now going to tell me for every command I run how long it took. I'm going to now select everything from the shows table where the title of the show equals quote unquote the office close quote semicolon enter. And that query took let's say in real terms 0.042 seconds. That's crazy fast. Like it's less than a second. I mean it's truly a split second. So no big deal. But it's a fairly simple query. But I bet we could optimize even this. Now why would you want to optimize even queries that are already pretty fast? Well, if they're very commonly being executed, and I dare say someone going to imdb.com and searching for The Office or any TV show, like that's the common case. People are looking for TV shows, movies, actors, and so forth. It'd be nice to use as little amount of time to answer those questions as possible. Why? One, it makes for happier customers and users because you're getting them the answer faster. Two, it saves you money because presumably if you've spent $1,000 for a server and that server has certain amount of RAM, a certain speed CPU or brain, it can only do so many searches per unit of time, per second, per minute, or the like. So, wouldn't it be nice if all of those searches is faster using less time? So, you can handle not a thousand users at once, but 2,000 users or 5,000 users all with the same hardware. So, there's uh certainly upsides there. Well, how can I go about optimizing a query? Well, I can create my own index. Another use of the create keyword in SQL where I can tell the database to optimize for searches on a specific table and specific columns therein. I say create index and then I come up with a name for the index whatever I want on the name of the table that I want to index and then in parenthesis the columns that I want to optimize for. So what does this mean in real terms? Well, let's go back to VS Code here and let me create an index called for instance title index though the name doesn't matter on the shows table uh using the title column. In other words, tell the database please expedite searches on the shows tables title column. After all, that's what I just searched on. Enter. Now, that took a moment, almost half a second, but that's a table. That's an index that only has to be created once. If I do a lot of updates and deletes, it might actually take a little bit of time over over the course of using the database to maintain that index. But for now, that's a one-time operation, creating the index. But watch what happens now if I scroll up in my history and go to the exact same query as before, which previously took 0.042 seconds, which yes, is fast, but not nearly as fast as the new version, which is 0.001 seconds instead. orders of magnitude faster. So I can handle 4 uh2 times as many users on the same database so to speak than I could have previously just by building this index. So what actually is an index? Well, we come full circle to discussions in like uh week five of the class. So an index in a database is very often created using what's called a B tree. This is not binary tree. A B tree is its own distinct structure that's very similar in spirit in that it's fairly shallow because most of the nodes have children but it doesn't necessarily have two children. It might have more children. And in fact, the more children the nodes have, the sort of higher up you can pull all of the leaf nodes and the shorter you can make the height of the tree. So this is just a generic representation of a B tree. But what this implies is that when I am now searching for titles like the office, the database doesn't have to do the default behavior which is start at the top and use linear search all the way to the bottom. If it has proactively built up an index in memory thanks to my command, it now has a treel like structure storing those titles that allows it to find in some logarithmic time whether it's log base 2 or some other base the same data much more quickly. And that's how we went from 042 to 0.001 second instead in this case here. Questions then on these here indexes? No. All right. Well, let's propose that we can combine some of today's ideas. It turns out that now we're getting to the point in the course where you're not just choosing between this language and another. You're generally using a suite of languages to solve problems. And indeed, in the coming weeks of the class, when we transition to web-based applications, you're going to use a bit of Python, you're going to use a bit of SQL, you're going to use a bit of JavaScript and two other languages called HTML and CSS. You might be using like five different languages at a time just to build one application. Why? Because some of them are better for the job than others. And indeed, that's the ecosystem in which real world software development is done. Well, to make this bridge, we have a version of the CS50 library, recall, for Python, which has functions like get string, even though it's not that useful because it's just like the input function, but get int uh and get float. But also, in the CS50 library for Python, we have a module that specifically makes it easier to use SQL from Python code. After all, wouldn't it be nice if I could get the best of both worlds and implement like an interactive program in Python, but that uses SQL to actually get back data? Or I can build a website that allows people to search for TV shows or TV stars and actually get that data from a database, but use Python to generate the web pages themselves. Well, we have some documentation for this library here, but I'm going to go ahead and use it in real time to show you how much more easily you can solve certain problems by using each tool for what it's good at. So, let's go back to VS Code here. Let me exit out of SQL light and get back to my normal terminal. And let me go ahead and let's say minimize my terminal here. Uh, actually, let's go ahead and open up favorites.py, which is where we left off before. And recall that in the last version of favorites.py, we had simply used a dictionary to go about keeping track of how many of you said Python or C or Scratch. And when I last ran this program with Python of favorites.py, pi. The answer looked like this. Now notice that it's not sorted alphabetically, otherwise C would be first. And it's also not sorted numerically, otherwise C would be second. So it would be nice in Python to maybe exercise some control over this. But I stopped sort of doing that before because it gets very annoying quickly. And by this I mean the following. Let me go back into VS Code here uh and into favorites.py. And if I wanted to sort by uh the counts here, I could do this. Uh, I could change my loop from iterating for favorite in counts to favorite in sorted counts. So, this is actually not too bad thus far. I can actually sort dictionaries pretty readily. So, now if I run this and let me make my terminal a little bit taller so we can see both results. If I run the program now, you'll see that it's sorted alphabetically by key. So apparently when you use the sorted function in Python and pass it a dictionary, you can still iterate over all of the key value pairs in that dictionary, but it's been sorted now by key. So that's nice if that's to be my goal, but maybe that's not really my goal. And here's how alternatively I could sort by value, the 190, the 58, and the 24. I can still use the sorted function, but I need to tell Python to use a key, a sorting key of the counts dictionaries gets function. Uh, and then if I run it again, I now see it's sorted by value. But darn it, it's now sorted in the opposite order. I see scratch at 24, then 58, then 190. If I want to reverse it, well then I have to go up here and add another named parameter. Reverse equals true. I can run it another time. And now I get the result I care about. Long story short, this is just very annoying to have to use that amount of code to actually answer relatively simple questions. And this is why we did transition for much of today to a declarative language like SQL that just let me select what I care about in that data. So if I again I go back into my database version with SQLite 3 of favorites.db. I'll maximize my terminal window. What did we do before? Well, we can select uh from the database uh select uh let's see favorite comma count star from favorites group by uh favorite semicolon whoops. Oh, sorry. What did we do? We do select language, comma, count, star from favorites, group by favorite. Oh, damn it. What happened? Oh, we deleted it. See, this is why you don't use the delete or drop command. So, I'm not going to demonstrate this again, but recall uh before break that when we last selected this information, we used the group by command to actually group by the language in question and we got back all the counts. But then we were very easily able to reorder things by actually just using order by and then doing something in ascending order or for instance descending order instead. Well, now let's actually combine these worlds of Python and SQL together to write first a program that does just that. But to do this, we're going to need to restore that database. So let's go ahead and do this. Let's remove favorites. DB, which is just a file in my account. Let's go ahead and run uh SQLite 3 of favorites.d DB to create a new version thereof. Let's now go ahead and change my mode as we did earlier in class to CSV. Let's now do import of favorites uh CSV into a table called favorites. And now let's doquit. And when I do ls, okay, now it's back favorites.db in addition to today's other files. Now let me go ahead and run SQLite 3 of favorites. DB. And just as a sanity check, select star from favorites semicolon. There's all of the data back. minus the addition and subtraction that we ourselves made earlier manually. And let's go ahead and in SQL go ahead and do select language, count star from favorites and group by language, but let's order by count star in descending order. And that's one of the last commands we ran with this file. And there is the answer in a single line of code instead of some 17 lines of code plus or minus some white space here. Can we merge now these two ideas? Well, let's see how to do this. Let's go back into favorites.py here and make a new and improved version of it that actually uses SQL and no dictionary, no for loop, no try except or any of this. Instead, let's go ahead and from CS50's own library import a SQL function which will give me access to this functionality. Let's create a variable called DB by convention, but I could call it anything I want and set it equal to CS50SQL function and pass to CS50SQL function the path to the database file I want to open. This is a little weird, but the syntax here is SQLite without the three colon slash favorites. DB. This syntax, otherwise known as a URI, is going to allow us to use the SQL light lang uh uh protocol in order to open up favorites. DB, which is the very file I was just experimenting with manually in my terminal. Here now is how I can execute a SQL query in Python using CS50's library. Now, as an aside, even though this is indeed meant to be a training wheel, CS50's library is just easier to use than a lot of the real world libraries that makes this possible. So because we spend so relatively little time on this, we're still using this training wheel for this. Give me a variable called rows because I want to get back all of the rows from this table that contain those languages and e do db.execute. The only function that's useful in the CS50 library for SQL is this execute function which allows me to write literally a line of SQL like select language count star uh from favorites group by language order by count star uh descending order. Just to make my life easier, I'm going to add that alias trick that we saw before. So as n to change the count to the variable n. And then here I can just do order by n instead. It's a little long, but notice that now I'm using SQL as a string that I'm passing as an argument to this dbexecute function. So at the very end of this, I've got to close my quote, close my parenthesis so as to use one language in effect inside of another. Now assuming I do get back a temporary tables rows with that line of code on line five, let's do this. For each row in rows, go ahead and do the following. Create a variable called language and set it equal to row quote unquote language. Then create another variable called n, for instance, and set it equal to row quote unquote n. And then let's just go ahead and print out language and n respectively. So what does CS50's library do? It returns by design a list of rows. Each of those rows is a dictionary of key value pairs. So when I do for row and rows, this is just iterating over a list of values. And we've done that over the past couple of weeks. Inside of this loop, I'm just creating temporarily two variables, uh, language and n, to show you that each row is indeed a dictionary, which means I can index into it using strings like quote unquote language and quote unquote n because those are the columns that I selected using this query up above. Strictly speaking, I don't even need these variables. I can just get rid of that and a little more succinctly just pass in row bracket language and then row bracket uh n instead. So let me go down to my terminal window here, exit out of SQLite, run Python of favorites.py in this form, enter and I get back it would seem the same exact answer 190 58 and 24 in this case. questions now on this co-mingling of languages. All right, how about one final thing? Once we have the ability to like use Python, now we can in fact make things interactive. So for instance, let me close my terminal temporarily. Let me go ahead and now ask for some user input. So after opening the database, let's do this. Let's ask the human using Python's input function or equivalently CS50's get string function for their favorite TV show and store it in that same variable. Then let's do a SQL query that selects that data. Rows equals db.execute select and let's see how many people selected uh this favorite problem rather not TV show how about favorite problem from our favorites data set. So select count star as n from the favorites database where the problem in question equals well now I need to put the user's input. I don't know what that is yet because they haven't typed it in yet. So, what I'm going to go ahead and do is a placeholder and say favorite close quote and make this whole thing an F string. Then I'm going to go down here and I don't need to iterate because ideally I'm just getting back a single answer. How many people chose this problem as their favorite? So, I'm going to say that uh the row I care about is simply the first row. So, rows is a list. So, rows bracket zero is the first and only row in that list. And then let's go ahead and print out row quote unquote n. Let's see the result here and then see what happens. Let me put some single quotes here and single quotes here. Let me open my terminal. Let me do python of favorites.py and I'll say hello, world. Enter. And as before at the start of class, 42 of you like that. However, this is not not not how you should ever write SQL code in Python. What could go wrong with this code? Nothing went wrong a moment ago, but what could go wrong? Yeah, the user input. How so? >> True. I don't know what those are yet, but we're about to go there. What even more simplistically could go wrong by plugging in the user's input here? Yeah, >> like hello. >> Exactly. If I inputed the other problem we played with, hello, it's me where it was it apostrophe s that if interpolated right here is clearly going to confuse the uh single quotes such that who knows what's going to come back. Now, in the best case, the code might just not work and I'll get some kind of error in on the screen, which is not great for the user because the program is not going to be useful. There's no user friendly error message. But in the worst case, the user could do something incredibly malicious if you are simply blinding blindly trusting user input and plugging their input into a SQL query that you yourself constructed. Why? What if the user types something crazy like the word delete or drop or update or any of those destructive commands that we saw earlier and somehow tricks your code into executing maybe the select but then eventually an additional query like a delete. Maybe they type in a semicolon and then delete or a semicolon and then drop or something like that. This is the biggest threat to taking user input and trusting it in the context of databases. And it's called uh as one of your classmates knows already, what's known as a SQL injection attack. A SQL injection attack is the ability for an adversary or an unknowing user to somehow inject code into your database. A SQL injection attack then might look something like this in the real world. here for instance is like the login screen to github.com. Um they do actually use SQL among other languages underneath the hood I believe not necessarily for this but suppose they did and when logging into github.com you're prompted for your username or email address and then of course your password. Well, what if I know a little something about SQL and suppose for the sake of discussion, GitHub is using SQL light, which they're not using because it's not meant for massive large uh massive data sets like this. But suppose they are. And just to be malicious, I type in my username mailinharbor.edu, but then I use a single quote and then dash dash. Well, the single quote is there, me being an adversary in the story, because maybe I can confuse their code by closing their quotes sooner than they intended. And we haven't talked about this yet, but it turns out that dash in SQL is the comment character. So it's like hash in Python or slash and C. This in SQL means ignore everything to the right. That alone can be used fairly maliciously as follows. Here, for instance, could be the code that GitHub is using underneath the hood, whereby they might have some Python code, and heck, maybe they're using the CS50 library that executes this pre-made query. select star from the users table where the username equals this question mark and the password equals this question mark passing in username and password for instance. Uh but if they are trusting the username and password I typed in and just plugging it right there, they could be vulnerable to indeed a SQL injection attack. For instance, this code we'll soon see is actually the right way to do it. But suppose they were doing it with fstrings like I started to in my version of favorites.py. Same thing. Select star from users. where username equals this username and password equals this password and the little f here means here's a format string. What could go wrong? Well, let me actually paste in the mail at harbor.edu single quote- dash text here. Notice that this single quote and this single quote are meant to surround the username. And same thing for the password there. But watch what happens when I type in my data. Mail at harbor.edu single quote. So this would seem to finish the thought prematurely. and then it says dash dash and so that just means ignore everything else. And so the effect here is essentially to gray out all of that stuff because it's effectively been commented out. So what GitHub ends up doing accidentally in this case is selecting star from users where username is mailon at harbor.edu irrespective of what his password actually is. And if you assume that down here they've got some conditional logic like well if we get back some rows that means that mail is in fact a registered user. Go ahead and log him in. We don't know what the code looks like, so it's dot dot dot. You've just enabled anyone on the internet to log in as me or anyone else just by suffixing their input with a single quote and dash dash. And that's the least of our concerns. If we additionally went in there and maybe instead of dash we put a semicolon and then delete from users or drop users, we could cause massive havoc on their database. This happens all the time. Even now in the current year, you can Google around and see examples of companies that have not used proper sanitization of user input. And it's not just the intern. It's like random people on the internet are accessing or destroying their data maliciously. So what is the solution to a problem like this? Well, one, do not use format strings in Python to simply plug in user input. But the more important lesson is never trust users input. either they're going to do something accidentally or they're going to do something maliciously and you do not want that to happen. So the solution then is to use a library. Almost always use a library. This is not a wheel you should reinvent yourself. And by library I mean something like this. If you instead use a library like CS50s and you don't just use fstrings, you'll see in a moment you use question marks. What will happen is this. When the user goes and types in mailinharvard.edu single quote dash, that's fine. and let them put weird scary characters like single quotes in their input. The library will take charge of escaping user input. So anything dangerous in their input will be changed from one single quote to two because we saw earlier today that that's how you escape a character. And that means that now what you have is in effect my username is apparently meenhar.edu apostrophe dash and that's my username. Well that's obviously not a real email address. It's not a real username. This is just going to return false. No rows are actually going to come back. And the way to do this now in our favorites example analogously is in VS Code here to actually go up into this uh execute line. Don't use an F string. Change the value of problem to be a placeholder instead and then pass into this execute function one or more arguments that will be substituted in for that question mark. And this is not a CS50 thing. This is a uh industry convention whereby you quite often use literally a question mark. And that means that whatever this variable's value is will get plugged into that question mark for you. But the single quotes will be added. Any dangerous characters will be escaped for you. And at that point, you can trust that the user can type in anything they want. Your code is not going to break. You can see hints of this actually in the real world. If you've ever gone to a website and they tell you like, oh, you can't you like for passwords for instance, like all of us probably intuitively know that you should have pretty long uh hard to guess passwords with letters and numbers and punctuation symbols. Sometimes websites very stupidly prohibit you from using certain punctuation symbols, which should drive you nuts because there's no computational reason that you have to put the onus on the user to sanitize their own input. But quite likely those websites have kind of learned part of this lesson and they know some characters can be dangerous in SQL like semicolons or single quotes or the like and they just don't want you to ever type those in. Even though there are solutions to this problem, use a library that someone else smarter than you u with more history of writing code than you has used that's open source so that many people have seen it and banged on it over the years so that this problem is not something you're vulnerable to. questions then on what these here SQL injection attacks are all about. Yeah, >> I guess you're telling the user what not to use, you're also telling them what system you're using and so maybe that >> Good point. So if by also telling people what characters they shouldn't use, you're leaking information because a smart adversary might know, oh well, if they don't want me using that symbol, they're probably using this language or this technology. Yes, no good comes from telling the world more information than they need to know. So that's another good paranoia to have. How about one other issue before we come full circle to the SQL injection attacks. There's another challenge with relational databases and with SQL uh itself, namely race conditions. This isn't so much a problem when I'm writing a a little program here on my own computer. uh but when you're running SQL code on a database in the real world in the cloud where you have many different servers talking to that database and many different users uh talking to those web servers as is going to be the case at Meta and Google and Microsoft and any number of popular companies nowadays and even some of CS50's own apps uses centralized SQL databases where if multiple people are trying to do the same thing on them at the same time submit their homework run check 50 we too are vulnerable to what are called race conditions. So what is a race condition? Well, the way I learned this back in the day when taking a course on databases and operating systems uh more generally was to think of a scenario like this. Maybe in your dorm, you and your roommates have a little dorm fridge and you're both in the habit of really liking to drink milk as the story was told to us. And so maybe one of you comes home from class one day and you get get to your room, look in the fridge, there's no milk in there. And so you decide to walk across the street to CVS or some other store to get milk. Meanwhile, your roommate comes home from their class and opens the fridge and it's like, "Oh, we're out of milk. Let me go to the store, too." And for the sake of the story, they go to a different store altogether so that you don't run into each other and the problem solves itself. So now both of you are on your way to a store to get milk. Time passes. You both come home. One of you puts a jug of milk in the fridge. The other one gets home and is like, "Ah, damn it." Like we already got milk. I can't fit this milk in the fridge or now it's too much milk. We don't really like milk this much. It's going to go bad. Like very bad outcome here. Having too much milk is the moral of the story. But what's the what stupid story? What's the What's the real takeaway? Why did we find ourselves in a situation where we ended up with too milk, too much milk? >> We didn't know what the other person >> we didn't know what the other person was doing. And to really geek out on this, we inspected the state of a variable that was in the process of being updated by someone else. And this is a thing in computing as far back as Scratch. Recall with Scratch, you could have multiple scripts running at the same time for a single sprite because Scratch in effect is multi-threaded. You can have a single sprite doing multiple things in parallel by having those multiple scripts. Similarly, here your room is sort of multi-threaded because you have two independent beings who can both go to the store, solve the same problem in parallel. The problem though is that if one is not aware that the other is doing that work already, you might make poor decisions. So, in the real world, what should the first roommate have done after inspecting the state of the refrigerator and realizing, "Oh, we're out of milk." Okay, call the other roommate or maybe more simply like put a note on the door or like maybe dramatically lock the refrigerator somehow. And in fact, that's a term of art in databases is to actually use a database lock so that if you are in the process of updating the value in the database, lock it so that no one else can inspect the value of that database and potentially make a poor decision. So when might this actually happen in the real world rather than the contrived milk example. So there are a lot of social media posts nowadays that are quite popular. To this day, as of today, this is still the most popular Instagram post for instance. And imagine when this was first posted, hundreds, thousands, hundreds of thousands of people might have all been clicking the heart icon essentially at the same time. Now, Meta uh the company behind Instagram presumably has lots and lots of different servers, but let's suppose for the sake of discussion they have a single database, which is not true, but the danger is still there. Even with multiple databases, all of these different web servers are talking to the same database. And suppose those those servers are using Python code and hey the CS50 library that might look a little something like this in order to decide how to update the total number of likes for an Instagram post. The first line of code running on meta servers might say this. Get these rows as follows. execute a query like select the current number of likes from the posts table where the ID of the post is whatever it is 1 2 3 4 5 6 whatever notice no SQL injection attacks uh possible here because I'm using the placeholder not an F string then the next line of code running on meta server maybe just stores in a variable just to make the code more readable uh the first rows likes column so it's again it's the CS50 library in the story rows is a list of dictionaries so this is the first such element in the list and this is the likes column in the column we just selected the temporary table. Lastly, what do we want to do? Well, we want to plus+ essentially that total. So, we update the post table setting the number of likes equal to this question mark where the ID equals this question mark. And we didn't see this already, but the CS50 library supports indeed multiple arguments after the SQL string. I'm going to update the number of likes to be likes plus one. Plugging in the same ID of that post. So in short, take on faith that it's quite common that in order to achieve one small goal like updating the number of likes stands to reason you might need to do two database queries or three lines of code. Now if these lines of code are executing on multiple web servers, you could certainly imagine that if people are hitting the the like button pretty much at the same time, maybe one server is going to execute this first line of code and it's going to get its answer. Maybe there's a hundred likes at this point in the story. And then just by chance on another server, this line of code is also executed, but it too gets the same answer. There's currently a hundred likes. Meanwhile, the first server in the story continues to do its execution of code such that it updates the number of likes from 100 to 101. But because the other server was essentially running the same code in parallel, it's going to make the same mathematical decision and update the number of posts, the number of likes from 100 to 101. But at this point in the story, the number of likes should obviously be 10. and two, so we've lost data. And that's one of the dangers of a race condition is that you'll end up with an inaccurate result. And for a company like Meta, they don't want to go losing data like likes like this. Like that actually drives engagement and so forth. And so like that's genuinely a technical, if not a business problem as well. So it's analogous to sort of the milk problem, but actually at scale. So what's the solution? There's a bunch of different ways, but conceptually, we just want to lock the database when this logic is being executed such that when one server is updating the number of likes, no one else should be allowed to update the like count at the same time. Now, that's a little crazy for someone as big as Meta because you're really just serializing all of these likes and slowing things down. So, there's more fine grain control nowadays, namely called transactions, where you can essentially lock not the whole table and certainly not the whole database, but just the row in question, for instance. And so you would use commands in SQL like begin transaction and then execute the lines of code that you want. And then when you're ready to commit it, that is save it, you use the commit command. But if something goes wrong or you get interrupted, you can actually roll back the whole thing. And what this kind of code does in effect by using more verbose uh CS50 and Python code like this is you can ensure that those three lines of code inside or technically the two database queries inside will either both be executed or not at all. They will not be interrupted. And that's the fundamental solution to this problem analogous to putting a lock on the fridge or by leaving a note or calling your roommate preventing them from making the same decision themselves. questions then on these race conditions the solutions again even though this won't be gerine for CS50 simply using techniques like locks and what we called transactions no all right then a final moment to end on uh we would not be a computer science course if we didn't introduce you to a few pieces of CS cannon uh here is a sort of meme that's circulated for years when it comes to like optical character recognition OCR of like toll booths trying to detect your license plate automatically This is someone trying to have a funny old time tricking the city into deleting their database altogether. Because if you're just scanning this off of someone's license plate or front of the car and just blindly plugging it in without sanitizing their input, escaping their input with something like a good library, you might very well drop the entire database. As an aside, something did something similar too where I think they made their license plate null. NL, which just confused the heck out of the system, too, because the programmers didn't understand why null was all over the place when lights were being run and whatnot. And lastly, a very famed uh character in the world of XKCD as computer science circles goes is this. So we'll end as we've done before on an awkward silence as you process this here canonical CS joke. >> Now you two know who Bobby Tables is. All right, that's it for week seven. We'll see you next time. [applause] Heat. Heat. [music] >> [music] [music] [music] [music] >> All right. This is CS50 and this is our lecture on artificial intelligence or AI. Particularly for all of those family members who are here in the audience with us for the first time. In fact, uh for those students among us, maybe a round of applause for all of the family members who have come here today to join you. [applause] Nice. So nice to see everyone. And as CS50 students already know, it's sort of a thing in programming circles to uh have a rubber duck on your desk. Indeed, a few weeks back, we gave one to all CS50 students. And the motivation is to have someone something to talk to in the presence of a bug or mistake in your code or confusion you're having when it comes to solving some problem. And the idea is that in the absence of having a friend, family member, TA of whom you can ask questions is to literally verbalize your confusion, your question to this inanimate object on your desk. And in that process of verbalizing your own confusion and explaining yourself, quite often does that proverbial light bulb go off over your head and voila, problem is solved. Now, as CS50 students also know, we sort of virtualized that rubber duck over the past few years and most recently in a form of uh this guy here. So, in students programming environment within CS50, a tool called Visual Studio Code at a URL of CS50.dev, they have a virtual rubber duck available available to them at all times. And early on in the very first version of this rubber duck, it was a chat window that looked like this. And if students had a question, they could simply type into the chat window something like, "I'm hoping you can help me solve a problem." And for multiple years, all the CS50 duck did was respond with one, two, or three quacks. Uh we have anecdotal evidence to suggest that that alone was enough for answering students questions because it was in that process of like actually typing out the confusion that you realize, oh, I'm doing something silly and you figure it out on your own. But of course now that we live in an age of chatgbt and claude and gemini and all of these other AI based tools came as no surprise perhaps when in 2023 this same duck started responding to students in English and that now is the tool that they have available which is in effect meant to be a less helpful version of chat GPT one that doesn't just spoil answers outright but tries to guide them to solutions akin to any good teacher or tutor and so today's lecture is indeed on just that and the underlying building blocks that make possible that their rubber duck in all of the AI with which we're all increasingly familiar, namely generative artificial intelligence using this technology known as AI to generate something, whether that's images or sounds or video or text. And in fact, what we thought we'd do to get everyone involved early on is if you uh have a phone uh by your side, if you'd like to go ahead and scan this QR QR code here, and that's going to lead you to a polling station where you can buzz in with some answers. Um, CS50's preceptor Kelly is going to kindly join me here on stage to help run the keyboard. And what we're about to do is play a little game and see just how good we humans are right now at distinguishing AI from reality. And so we'll borrow some data from uh the New York Times, which a couple years back actually published some examples of AI and not AI, and we'll see just how good this this technology has gotten. So here we have two photographs on the screen. In a moment, you'll be asked on your phone, if you were successful in scanning that code, which one of these is AI, left or right. So hopefully on your phone here, if you want to go ahead and swipe to the next screen, we'll activate the poll here. In a moment, you should see on your phone a prompt inviting you to select left or right. And feel free to raise your hand if you're not seeing that. But it looks like the responses are coming in. And at the risk of spoiling, it looks like 70% plus of you think it is the answer on the right. And if Kelly, maybe we could swipe back to the two photographs. In this particular case, yes, it was in fact the one on the right. Maybe it looked a little too good or maybe a little too unreal. Maybe. Let's see maybe a couple of other examples. So, same QR code. No need to rescan. Let's go ahead and pull up these two examples. Now, two photographs, same question. Which of these is AI? Left or right? left or right. All right, want to take a look at the chart, see what the responses are coming in a little closer in this case, but a majority of you think the answer is in fact left here, though 5% of you were truthfully admitting that you're unsure. But Kelly, if you want to swipe back to the photos, the answer this time was in fact a trick question. They were both in fact AI, which perhaps speaks to just how good this technology is already getting. Neither of these faces exists in the real world. It was synthesized based on lots of training data. So, two photographs that look like humans but do not in fact exist. How about one more? This time focusing on text, which will be uh the focus, of course, underlying our duck. Did a fourth grader write this or the new chatbot? Here are two final examples. Uh same code as before, so no need to rescan. And here are the texts. Essay one. I like to bring a yummy sandwich and a cold juice box for lunch. And sometimes I'll even pack a tasty piece of fruit or a bag of crunchy chips. As we eat, we chat and laugh and catch up on each other's day. dot dot dot. C. Essay two. My mother packs me a sandwich, a drink, fruit, and a treat. When I get into a lunchroom, I find an empty table and sit there and eat my lunch. My friends come and sit down with me. dot dot dot. The question now, lastly, is which of these is AI? One or two? Essay one or two? The bars here are duking themselves out. Looks like a majority of you say essay one. Let's go back to the text. And someone of you who one of you who says essay 1, why if you want to raise a quick hand? Why essay one? Yeah. >> Okay. And so essay 2 looks more like you would write. And can I ask what grade you are in? >> A fifth grader. So is this a new fifth grader or not? The answer here in fact is that essay one is the AI because indeed essay 2 is more akin to what a fourth or if I may a fifth grader would write. And I dare say there are maybe some telltale signs. I'm not sure a typical fourth grader or fifth grader would catch up on each other's day in the vernacular that we see in essay one. But suffice it to say this game is not something we can play for in the years to come because it's just going to get too hard to discern something that's AI generated or not. And so among our goals for today is really to give you a better sense of not just how technologies like this duck and these games that we've played here with images and text work, but really what are the underlying principles of artificial intelligence that frankly have been with us and have been been developing for decades and have really now come to a head in recent years thanks to advances in research, thanks to all the more cloud computing, thanks to all the more uh memory and disk space and information sheer volume thereof that we have at our disposal that can be used to train all of these here technologies. ies. So that their duck is built on a fairly complicated uh architecture that looks a little something like this where here's a student using one of CS50's tools. Here's a website with which CS50 students are familiar called CS50.AI AI where we the staff wrote a bunch of code that actually talks to what are called APIs, application programming interfaces, thirdparty services by companies like Microsoft and OpenAI that really have been doing the hard work of developing these models as well as some local sweet uh some local sauce that we CS50 add into the mix to make it specific the ducks answers to CS50 itself. But what we've essentially been doing is uh something that with which you might be familiar in part prompt engineering which has started popping up for better or for worse on uh LinkedIn profiles everywhere. And prompt engineering really it's not so much a form of engineering as it is a form of asking good questions and being detailed in your question giving context to the underlying AI so that the answer with high probability is what you want back. And so there's two terms in this world of prompt engineering that are worth knowing about. So in CS50 has leveraged both of these to implement that duck. We for instance wrote what's called a system prompt which are instructions written by us humans often in English that sort of nudge the underlying AI technology to have a certain personality or a specific domain of expertise. For instance, we CS50 have written a system prompt essentially that looks like this. In reality, it's like a lot of lines long nowadays, but the essence of it is this. You are a friendly and supportive teaching assistant for CS50. You are also a rubber duck and that is sufficient to turn an AI into a rubber duck. It turns out answer student questions only about CS50 in the field of computer science. Do not answer questions about unrelated topics. Do not provide full answers to problem sets as this would violate academic honesty. Answer this question colon and after that preamble if you will aka system prompt we effectively copy paste whatever question a student has typed in otherwise known as a user prompt. And that is why the duck behaves like a duck in our case and not a cat or a dog or a PhD, but rather something that's been attenuated to the particular goals we have pedagogically in the course. And in fact, those of you who are CS50 students might recall from quite some weeks ago in week zero when we first introduced the course uh to the class, we had code that we whipped up that day that ultimately looked a little something like this. And I'll walk through it briefly line by line. But now on the heels of having studied some Python in CS50, this year code that I whipped up in the first lecture might make now a bit more sense. In that first lecture, we imported OpenAI's own library code that a third party company wrote to make it possible for us to implement code on top of theirs. We created a variable called client in week zero and this gave us access to the OpenAI client. That is software that they wrote for us. We then defined in week zero a user prompt which came from the user using the input function with which CS50 students are now familiar. And then we defined this system prompt that day where I said limit your answer to one sentence. Pretend you're a dot dot dot cat I think was the persona of the day. And then we used some bit more arcane code here. But in essence we created a variable called response which was meant to represent the response from OpenAI server. We used client.responses.create create which is a function or method that OpenAI gives us that allows us to pass in three arguments. The input from the user that is the user prompt the instructions from us that is the system prompt and then the specific model or version of AI that we wanted to use and the last thing we did that day was print out response.output_ext and that's how we were able to answer questions like what is CS50 or the like. So, we've seen all of that before, but we didn't talk about that week exactly how it was working or what more we could actually do with it. And so, in fact, what I thought we'd do today is peel back a layer that we've not allowed into the course up until now. And indeed, you still cannot use this feature until the very end of the class in CS50 when you get to your final projects, at which point you are welcome and encouraged to use VS Code in uh this particular way. So, here again is VS Code. For those unfamiliar, this is the programming environment we use here with students. And let me open up some code that was assigned to students a couple of weeks back, namely a spell checker that they had to implement in C. So I came in advance with a folder called speller. And inside of this folder, I had code that day and all students had that week called dictionary.c. And in this file, which will not look familiar to many of you if you've not taken weeks 0 through uh seven up until now, we did have some placeholders for students. So long story short, students had to answer a few questions. that is write code to do this to-do, this to-do, this to-do, and one more. There were four functions or blanks that students needed to fill in with code. And I dare say it took most students 5 hours, 10 hours, 15 hours, something in that very broad range. Let me show you now how using AI, you soon, the aspiring programmers can start to write code all the more quickly. not by just choosing a different language but by using these AI best based technologies beyond the duck itself. So what I've done here on the right hand side of VS code is enabled a feature that CS50 disables for all students from the start of the course called copilot. This is very similar in spirit to products from Google um and anthropic and other companies as well. But this is the one that comes from Microsoft and in turn GitHub here and it too gives us me sort of a chat window here and this is just one of its features. For instance, if I wanted to implement to get started the check function, I could just ask it to do that. Implement the check function and uh how about using a hasht in C. I'm going to go ahead and click enter. Now it's going to work. It's using as reference that is context the very file that I've opened which is dictionary.c here. Um, copilot in general as as well as a lot of AI tools are familiar with CS50 itself because it's been freely available as open courseware for years. What you see here it doing is essentially thinking though that's a bit of an overstatement. It's not really thinking. It's trying to find patterns in what the the problem is I want to solve among all of its training data that it's seen before and come up with a pretty good answer. So for today's purposes, I'm going to wave my hand at the chat GPT like explanation of what to do that has appeared at right. But what's juiciest to look at here is on the left if I now scroll down is highlighted in green is all of the suggested code for implementing this here check function. Now it might not be the way you implemented it yourself but I do dare say this has hints of exactly what you probably did when it came to implementing a hash a hash table. And in fact I can go ahead and keep all of this code if I like how it looks. Let's assume that's all correct there. Uh it might be the case that I want to now implement the load function. So how about now implement load function enter as simple as that. And what data is being used? Well, a few different things. It says one reference. So it's indeed using this one file. But there's also what are called comments in the code with which all students are now familiar. These slash commands in gray that are giving English hints as to what this function is supposed to do. There's implicit information as to what the inputs to these functions, otherwise known as arguments are meant to be, what the outputs are meant to be. So the underlying AI called co-pilot here kind of has a decent number of hits hints and much like a good TA or good software engineer that's enough context to figure out how to fill in those blanks. And so here too if I scroll down now we'll see in green some suggested code via which it could uh solve that same problem as well. the load function. And I dare say I've been talking for far fewer minutes than CS50 students spent actually coding the solution from scratch to this here problem. So I'll go ahead and click keep. I'll assume that it's correct. But that's actually quite a big assumption. And those of you wondering like why have we been learning off all this? If I could just ask in English it to do my homework for me. I mean there's a lot to be said for the muscle memory that hopefully you feel you've been developing over the past several weeks. The reality is if you don't have an eye for what you're looking at, there's no way you're going to be able to troubleshoot an issue in here, explain it to someone else, make marginal changes or the like. And yet, what's incredibly exciting even to someone like me, all of the staff, friends of mine in the industry, is that this kind of functionality and AI amplifies your capabilities as a programmer sort of overnight. Once you have that vocabulary, that muscle memory for doing it yourself, the AI can just take it from there and get rid of all of the tedium, allow you to focus at the whiteboard with the other humans on sort of the overarching problems that you want to solve and leave it to this AI to actually solve problems for you. A fun exercise too might be to go back uh at terms end and try solving any number of the courses assignments. For instance, let me go ahead and do this. In my terminal window here, I'm going to go back to my main directory. I'm going to create an empty file called Mario.c. C that has nothing in it. And I'm going to go ahead in my chat window here and say, please implement a program in C that prints a left aligned pyramid of bricks using hash symbols for bricks and use the CS50 library to ask the user for a non negative height as an integer. Period. I dare say that's essentially the English description of what was for CS50 this year problem set one to implement a program called Marioc. This two is sort of doing its thing. It's using one reference. It's working. It knows as a hint that this file is called Mario.c. And it's seen a lot of those in its training data over time. There's an English explanation of what I should do. And those CS50 students in the room probably recognize the sort of basic structure here of using a dowh loop to prompt the user for a height using the CS50 library which has been included. print a left alto line pyramid using some kind of loop and boom, we are done. And these are fairly bite-sized problems as you'll see as you get to terms end with your final project, which is a fairly open-ended opportunity to apply your newfound knowledge and savvy with programming itself to a problem of interest. It will allow you to implement far grander projects, far greater projects than has been possible to date, certainly in just the few weeks we have to do it because of this uh amplification of your own abilities. So with that promise, let's talk about how in the heck any of this is actually working. I clearly just generated a whole lot of stuff and that's how we began the story with the generation of those images and those two essays by kids. But what is generative artificial intelligence or really what is AI itself? And these are some of the underlying building blocks that aren't going anywhere anytime soon and indeed have led us as a progression to the capabilities you just saw. So spam, we sort of take for granted now that in our Gmail inboxes or Outlook inboxes, most of the spam just ends up in a folder. Well, there's not some human at Microsoft or Google sort of manually labeling the messages as they come in, deciding spam or not spam. They're figuring out using code and nowadays using AI that looks like spam and therefore I'm going to put it in the spam folder, which is probably correct 99% of the time, but indeed there's potentially a failure rate. Um, other applications might include handwriting recognition. Certainly Microsoft and Google doesn't know the handwriting style of all of us here in this room, but it's been trained on enough other humans handwriting styles that odds are your handwriting in mine looks similar to someone else's. And so with very high probability, they could recognize something like Hello World here as indeed that same digital text. All of us are into streaming services nowadays, Netflix and the like. Well, they're getting pretty darn good at knowing if I watched X, I might also like Y. Why? Well, because of other things I've I've watched before and maybe upvoted and downvoted. Maybe because of other things people have watched who like similar movies or TV shows to me. So that too is AI. There's no ifels else if else if else construct for every movie or TV show in their database. It's sort of figuring out much more organically, dynamically what you and I might like. And then all these voice assistants today, Siri, Alexa, Google Assistant, and the like. Those two don't recognize your voice or necessarily know what questions you're going to ask it. There's no massive if else if that has all possible questions in the world just waiting for you or me to ask it. That too, of course, is dynamically generated. But that's getting a bit ahead of ourselves. Let's like rewind in time. And some of the parents in the audience might remember this year game among the first arcade games in the world, namely Pong. And so this was a black and white game whereby there's two players, a paddle on the left, a paddle on the right, and then using some kind of joystick or track ball, they can move their paddles up and down, and the goal is to bounce the ball back and forth and ideally catch it every time. Otherwise, you uh lose a point. Uh this is just an animated GIF, so there's nothing really dramatic to watch. It's going to stay at 15 against 12. Uh just looping again and again. Nothing interesting is going to happen, but this is a nice example of a game that lends itself to solving it with code. And indeed, it's been in our vernacular for years to play against not just the computer, but the the CPU, the central processing unit, or really the AI. And yet, AI does not need to be nearly as sophisticated as the tools we now see. For instance, here's a successor to Pong known as Breakout. Similar in spirit, but there's just one paddle and one ball, and the goal is to bounce the ball off of these colorful bricks, and you get more and more points depending on how high up you can get the ball. All of us as humans, even if you've never played this old school game, probably have an instinct as to where we should move the paddle. If the ball just left it going this way, which direction should I move the paddle? I mean, probably to the left. And indeed, that'll catch it on the way down. So, you and I just made a decision that's fairly instinctive, but it's been ingrained in us, but we could sort of take all the fun out of the game and start to quantify it or describe it a little more algorithmically, step by step. In fact, decision trees are a concept from economics, strategic thinking, computer science as well. That's one way of solving this problem in such a way that you will always play this game well if you just follow this algorithm. So, for instance, how might we implement uh code uh or decision-m process for something like breakout? Well, you ask yourself first, is the ball to the left of the paddle? If so, you know where we're going, then go ahead and move the paddle left. But what if the answer were no? In fact, well, you don't just blindly move the paddle to the right. probably. What should you then ask? >> Are we right below the ball? >> Are you right below the ball? If the ball's coming right at you, you don't want to just naively go to the right and then risk missing it. So, there's another question to ask. Is the ball to the right of the paddle? And that's a yes no question. If yes, well then okay, move it to the right. But if not, you should probably stay exactly where you are and don't move the paddle. All right, so that's fairly deterministic, if you will. Um, and we can map it to code using pseudo code in uh say a class like CS50. We can say in a loop, well, while the game is ongoing, if the ball's to the left of the paddle, then move the paddle left. Uh, else if the ball's to the right of the paddle, sorry for the typo there, move the paddle right. Uh, else just don't move the paddle. And so these decision trees, as we drew it, have a perfect mapping to code or really pseudo code in this particular case, which is to say that's how people who implemented the breakout game or the pawn game, who implemented a computer player surely coded it up. It was as straightforward as that. But how about something like tic-tac-toe, which some of you might have played on the way in for just a moment on the scraps of paper um that you might have had. Uh here we have a tic-tac-toe board with two uh O's and two X's. For those unfamiliar, this game tic-tac-toe, otherwise known as knights and crosses, is a matter of going back and forth, X's and O's between two people. And the goal is to get three O's in a row or three X's in a row, either vertically, horizontally, or diagonally. So this is a game here in mid-progress. Well, let's consider how you could solve the game of tic-tac-toe like a a computer, like an AI might. Well, you could ask yourself, can I get three in a row on this turn? Well, if yes, well, play in the square to get three in a row. It's as straightforward as that. If you can't, though, what should you ask? Well, can my opponent get three in a row on their next turn? Because if so, you should probably at least block their move next, so at least you don't lose. now. But this game, tic-tac-toe, is relatively simple as it is, gets a little harder to play when it's not obvious where you should go. Now, all of us as humans, if you grew up playing this game, probably had heruristics you used, like you really like the middle or you like the top corner or something like that. So, we probably can uh make our next move quickly, but is it optimal? And I dare say if back in childhood or more recently you've ever lost a game of tic-tac-toe like you're just bad at tic-tac-toe because logically there's no reason you should ever lose a game of tic-tac-toe if you're playing optimally. At worst you should force a tie but at best you should win the game. So think of that the next time you play tic-tac-toe and lose like you're doing something wrong. But in your defense it's because the question mark is sort of not obvious. like how do I answer it when the answer is not right in front of me to move for the win or move for the block? Well, one algorithm you could have been using all of these years is called Miniax. And as the name suggest, it's all about minimizing something and or maximizing something else. So here too, let's take a bit of fun out of the game and turn it into some math, but relatively simple math. So here we have three representative tic-tac-toe boards. O has one here, X has one here, and the middle is a tie. Doesn't matter how we score these boards, but we need a consistent system. So I'm going to propose that anytime O wins the score of the game is negative 1. Anytime X wins, the score of the game is a positive one. And anytime nobody wins, the score is zero. Um so at this point each of these boards have these values negative 1, 0, and one. So the goal therefore in this game of tic-tac-toe now is for X to maximize its score because one is the biggest value available and O's goal in life is to minimize its score. So that's how we take the fun out of the game. We turn it into math where one player just wants to maximize, one player just wants to minimize their score. All right, so a quick uh sanity check here. Here's a board. It's not colorcoded. What is the value of this board? >> One because x has in fact one straight there down the middle. So x is one zero o is negative one otherwise a tie. So now let's see how we go about with those principles in place figuring out where we should play in tic-tac-toe. Now, here's a fairly easy configuration. There's only two moves left. It's not hard to figure out how to win or tie this game. But let's use it for simpl for simplicity. It's O's turn, for instance. So, where can O go? Well, that invites the question, well, what is the value of the board? Or how do we how do we minimize the value of the board for O to win? Well, O can go in one of two places, top left or bottom middle. Which way should O go? Well, if O goes in top left, we should consider what's the value of this board? Is it minimal? Well, let's see. uh if O goes here, X is obviously going to go here. X is therefore going to win. So the value of this board is going to be a one. Now since there's only one way logically to get from this configuration to this one, we might as well call the value of this board by transitivity one. And so O probably doesn't want to go there because that's a pretty maximal score and O wants to minimize. Over here though, if O goes bottom middle, well then X is going to go top left. And now no one has one. So the value of this board is thus >> zero. we might as well treat this as zero because that's the only way to get there logically. So now O more mathematically and logically can decide do I want an end point of one or an end point of zero. Well zero is probably the better option because that's less than one and thus it's the minimal possibility. So O is going to go ahead in the bottom middle and at least force a tie. And so that's where you see evidence where if you humans are ever losing the game of tic-tac-toe, you have not followed that their logic. But you could probably do it if there's just two moves left. But the catch is, let's go ahead and sort of rewind to three moves left here. There are three blanks. And I've kind of zoomed out. The catch is that the decision tree gets a lot bigger the more and more moves that are left. It gets sort of bigger and bushier in that it's essentially doubling in size and width. And that's great if you have the luxury of writing it down on a piece of paper. But if you're doing this on your head while playing against a a fifth grader, if I may, you're probably not drawing out all of the various boards and configurations, trying to play it optimally. You're going with some instinct. And your instincts might not be aligned with an algorithm that is tried andrude miniax that will ideally get you to win the game, but at least will get you to force a tie if you can't win. But tic-tac-toe is not that hard. I mean, how many different ways are there to play tic-tac-toe? could write a computer program to pretty much play tic-tac-toe optimally. Um, we could use code like this. If the player is X for each possible move, calculate the score for the board at that point in time and then choose the move with the highest score. So, you just try all possibilities mathematically and then you make the decision. Most of us in our heads are not doing that, but we could. Um, else does the player essentially do the same thing, but choose the minimal possible score. So, that's the code for implementing tic-tac-toe. How many ways are there to play tic-tac-toe though? Well, 255,168, which means if we were to draw that tree, it would be pretty darn big and it would take you quite a bit of time to sort of think through all those possibilities. So, in your defense, you're maybe not that bad at tic-tac-toe. It's just harder than you thought as a game. But what about games with which we might as adults be more familiar? Well, what about the game of chess, which is often used as a measure of like how smart a computer is, whether it's Watson back in the day playing against it or something else? Well, if we consider even just the first four moves of tic-tac-toe, whereby I mean black goes and white goes, and then they each go three more times. So, four pair-wise moves. How many different ways are there to play chess? Well, it turns out 85 billion just to get the game started. And that's a lot of decisions to consider and then make. How about the game of Go a familiar? Consider the first four move 266 quintilion possibilities. And this is where we sort of as humans and even with our modern PCs and Macs and phones kind of have to throw up our hands because I don't have this many bytes of memory in my computer. I don't have this many hours in my life left to actually crunch all of those numbers and figure out the solution. And so where AI comes in is where it's no longer as simple as just writing if else's and loops and no longer as simple as just trying all possibilities. You instead need to write code that doesn't solve the problem directly but in some sense indirectly. You write code so that the computer figures out how to win. Perhaps by showing it configurations of the board that are a good place to be in that is promising and maybe showing it boards that it doesn't want to find itself in the configuration of because that's going to lead it to lose. In other words, you train it but not necessarily as exhaustive. And this is what we mean nowadays by machine learning. writing code via which machines learn how to solve problems generally by being trained on massive amounts of data and then in new problems looking for patterns via which they can apply those past training data to the problem at hand. And reinforcement learning is one way to think about this. In fact, in fact, we as humans use reinforcement learning which is a type of machine learning sort of all of the time. Um in fact uh uh a fun demonstration to watch here involves these here are pancakes. So, in fact, let me go ahead and pull up a short recording here of an actual researcher in a lab who's trying to teach a robot how to make uh how to flip pancakes. So, we'll see here in this video that there's a robot has a arm that can go up, down, left, right. This, of course, is the human, the researcher, and he's just going to show the robot one or more times like how to flip a pancake and crosses his fingers and okay, seems to have done it well. Does it again. Not quite the same, but pretty good. And now he's going to let the robot just try to figure out how to flip that pancake after having just trained it a few different times. The first few times, odds are the robot's not going to do super well cuz it really doesn't understand what the human just did or what the whole purpose of. But and here's the key detail with reinforcement learning. Behind the scenes, the human is probably rewarding the robot when it does a good job. like better and better it flips, the more it gets rewarded as by like hitting a key and giving it a point, for instance, or giving it the digital equivalent of a cookie. Or conversely, every time the robot screws up and drops the pancake on the floor, sort of a proverbial slap on the wrist, a punishment so that it does less of that behavior the next time. And any of you who are parents, which by definition today, many of you are, odds are, whether it's not this or maybe just verbal uh approval or reprimands, have you probably trained children at some point to do more of one thing and less of another. And what you're seeing in the backdrop there is now just a quantization of the movements X, Y, and Z coordinates so that it can do more of the X's and the Y's and the Z that led it to some kind of reward. And now after you're up to some 50 trials, the robot seems to be getting better and better such that like a good human, we'll see if I can do this without embarrassing myself, can flip the thing. That's pretty good. That was pretty I've been doing this a long time. Okay, [applause] so we've seen then how you might uh reinforce learning through that kind of domain. Let's take an example that's familiar to those of you who are gamers. Anytime you've played a game where there's some kind of map or a world that you need to explore up, down, left, right, maybe you're trying to get to the exit. So here simplistically is the player at the yellow dot. Here for instance in green is the exit of the map and you want to get to that point. And maybe somewhere else in this world there's a lot of like lava pits and you don't want to fall into the lava pit because you lose a life or you lose a point or there's some penalty or punishment associated with that. Well, we with this bird's eye view can obviously see how to get to the green dot. But if you're playing a game like Zelda or something like that, all you can do is move up, down, left, right, and sort of hope for the best. So, let's do just that. Suppose the yellow dot just randomly chooses a direction and goes to the right. Well, now we can sort of take away a life, take away a point or effectively punish it so that it knows don't do that. And so long as the uh player has a bit of memory, either the human player or the code that's implementing this just with a dark red line, that means don't do that again because that didn't lead to a good outcome. So maybe the next time the yellow dot goes this way and this way and then ah didn't realize that that's actually the same lava pit. But that's fine. Use a little bit more memory and remind me don't do that because I just lost a second life in this story and maybe it goes this way next time. Ah, now I need to remember don't do that. But effectively, I'm either being punished for doing the wrong thing. Ah, or as we'll soon see, being rewarded for doing more of the successful thing. And just by chance, maybe I finally make my way to the exit in this way. And so I can be rewarded for that. Now I got 100 points or whatever it is, the high score. So now, as per these green lines, I can just follow that path again and again, and I can always win this game. kind of like me nowadays, like 30 years later, playing Super Mario Brothers because I can get through all the warp levels because I know where everything is because for some reason that's still stored in my brain. Is this the best way to play? Am I as good at Super Mario Brothers as I might think? What's bad about this solution? Yeah. >> Exactly. Yeah. I've moved many more times than I need to. And just for fun today, what grade are you in? >> Uh, seventh. >> Seventh grade. Wonderful. So now seventh grade observation is like exactly that that we could have taken a shorter path which is essentially that way albeit uh making some straight moves. And so we're never going to find that shorter path. We're never going to get the highest score possible if I just keep naively following my welltrodden path. And so how do we break out of that mold? And you can see this even in the real world. Another sort of personal example is I'm the type of person for some reason where if I go to a restaurant for the first time, I choose a dish off the menu and I really like it. I will never again order anything else off that menu other than that dish because I know it is good. But there could be something even better on the menu, but I'm never going to explore that because I'm sort of fixed in my ways, as some of you from the smiles might be too. But what if we took advantage of exploring just a little bit? And there's this principle of exploring versus exploiting when it comes to using artificial intelligence to solve problems. Up until now, I've just been exploiting knowledge I already have. Don't go through the red walls. Do go through the green walls. Exploit, exploit, exploit. and I will get to a final solution. But what if I just sprinkle in a little bit of randomness along the way and maybe 10% of the time as represented by this epsilon variable, I as the computer in the story generate a random number between zero and one. And if it's less than that percent, which is going to happen 10% of the time, I'm going to make a random move instead of one that I know will get me closer to the exit. Otherwise, I'll indeed make the move with the highest value. Now, this isn't going to necessarily win me the game that first time, but if I play it enough and enough and enough and insert some of this randomness, I might very well find a better solution and therefore be a better player, a better winner overall. If I just 10% of the time ordered something else off the menu, I might find that there's an amazing dish out there that otherwise I wouldn't have discovered. And so indeed using that approach can we finally find a more optimal path through the maze as was shorter there presumably therefore maximizing our score and doing even better than we might have by just exploiting the same knowledge. So you can see this even in the game of Breakout especially if you write a solution in code to play this game for you. Let me go ahead and pull up another video recording of an AI playing Breakout. And what this AI is doing is essentially figuring out maybe more intelligently than you or I could, how to play this game optimally. And what we'll see here is that just like uh the pancake flipping robot, there's some notion of scoring and rewards and penalties here. So like right now, the paddle is just doing random stuff. It doesn't really know how to play the game yet, but it realizes after 200 episodes that, oh, my score goes up if I hit the ball and it goes down equivalently if I miss it. and it's still a little twitchy. It doesn't quite understand what it's supposed to do and why. But if you do it again and again and again and it's rewarded andor punished enough, you'll see that it starts to get pretty good and closer to what a good human might do. But here's where the algorithm gets a little creepy. If you let it play long enough, or if you and I, the humans play long enough, you might find a certain trick to the game. I dare say the AI becomes a bit scarily sent sentient in that turns out if you're smart enough to break through that top row, you can let the game just play itself for you and maximize your score without even touching the ball. Something that I do find a little creepy that I just figured out how to do that without being told. But it's just a logical continuation of rewarding it for good behavior and punishing it for bad behavior. So that next time you have an occasion to play Breakout, consider that kind of strategy as opposed to doing more of the work yourself, let the computer do it for you instead. Well, what else is there to consider in this world of AI in the context of machine learning? Well, there's specifically a category of learning that's supervised. And we've been using this for years. And in fact, our first example of spam early on was certainly supervised. Why? Because it was you and I who was like putting the ma email into the spam folder. to this day, maybe once a day, I hit the keyboard shortcut in Gmail to say, "Ah, this is spam. You should have caught this." And that is training Google's algorithm further, assuming it's not just little old me, but maybe thousands of people tagging that same kind of email as spam. That's supervised learning and that there's a human in the loop doing at least something. Um, so spam detection might be one of those. But the catch is that labeling data in that way manually just doesn't scale very well. That would be akin to having someone at Google or Microsoft labeling every email or someone at Netflix doing the same for all of the videos out there. It's expensive in terms of human power. And there's certainly problems out there with so much data. It's just not realistic for humans to label millions of pieces of data, billions of pieces of data. We've got to move to an unsupervised model. And so this is where the world starts to consider deep learning, solving problems using code whereby you don't even have humans in the loop in quite the same way. and neural networks inspired by the world of biology are sort of the inspiration for what is the state-of-the-art even underlying today's rubber duck and more generally these things called large language models like chat GPT and the like. So here pictured somewhat abstractly is a neuron and it's something in the human body that transmits a signal say from left to right electrically and if you have multiple neurons you can intercommunicate among them so that if I think a thought uh then I know how to raise my hand because some kind of message electrically has gone from my head to this extremity here. So that's in essence what I remember from nth grade biology. But as computer scientists, we sort of abstract all of this away. So instead of calling these two neuron, drawing them as neurons, let's just start drawing neurons as these little circles. And if they have connective tissue between them of sorts, we'll just draw a a straight line an edge between them. So this is what a computer scientist would call a graph. If you have two such neurons over here leading to one out uh one neuron here, you can think of this as being like maybe two inputs to a problem and now one output there too. We can represent the notion of problem solving, which is what CS50 and intro courses more generally are all about. So let's solve a problem with a neural network without necessarily training it in advance, just letting it figure out how to answer this question. Here's a very simple two-dimensional world, XY grid, and here are two dots. And the dots in this world are either blue or they are red. But I have no idea yet what makes a dot blue or red. However, if you train me on those two dots, I bet I could come up with predictions, especially if you let me label this world in terms of x coordinates on the horizontal, y-coordinates on the vertical, and then you know what? We can think of this neural network very simply as representing the x coordinate here, the y-coordinate here, and the answer I want to get is quote unquote red or blue or zero or one or true or false, however you want to think of the representation. So, how do I get from a specific xycoordinate to a prediction of color if I only know the coordinates? Well, up from the get-go, maybe the best I can do is just divide the world into blue dots on the left and red dots on the right. A best fit line, if you will, based on very minimal data. Of course, if you give me a third dot, it's going to be pretty easy to realize that I was a little too hasty. That line is not vertical. So, maybe we pivot the line this way. And now I'm back in business. Now, I can predict with higher probability based on XY what color the next dot will be. You give me enough of these dots, I can come up with a pretty good best fit line. It's not perfect, but here's a hint at why AI is not perfect, but 99% of the time, maybe I'll be able to predict correctly. And I can do even better if you let me squiggle the line a little bit and maybe make it more than just a simple uh slope. So, what is it we're really doing with implementing this neural network, albeit simplistically with just three neurons? Well, essentially, we're trying to come up with three values, three parameters, an A, a B, and a C. And what do those represent? Well, really just a solution to this formula. that their line we drew can be represented if you think back to like high school math with a formula along these lines where by it's a * x plus b * y plus some constant c and we can just arbitrarily conclude that if that value mathematically gives me a number greater than zero predict it's going to be blue otherwise predict it's going to be red we can sort of map our mathematics just like with tic-tac-toe to the actual problem we care about by defining the world in this way and so if you give me enough data points and enough data points I can come up with answers for that A, that B, that C. The so-called parameters in neural networks. Now, in reality, neural networks are not composed of like three neurons and a couple of edges. They look a little something more like this. And in practice, they've got billions of these things here on the screen. In which case, pretty much every one of these edges represents some mathematical value that was contrived based on lots and lots of training data. And whereas I, the computer scientist, might know what these neurons over here represent because those are my inputs, three in this case. and I, the computer scientist, know what this one represents at the end. If you sort of took the hood off of this thing and looked inside the neural network, even though there'd be millions billions of numbers going on there, I can't tell you what this neuron represents or why this edge has this uh weight. It's because of the massive amount of training data that that's just how the math works out. And if you feed me more data, I might change some of those parameters more. So the graph ultimately might look quite different, but my inputs and my outputs are going to be what I use to solve that their problem. So if you want to predict like rainfall from humidity or pressure, you can have two inputs giving that one output. Uh advertising dollar spent in a given month that might predict sales by just having trained again on such volumes of data. And when we get now full circle to something like CS50's rubber duck and large language models like claude and gemini and chacht what's really happening and this is all hot off the press in recent years screenshotted here are some of the recent research papers that have driven a lot of this advancement in recent years. you have from open AAI say a generative pre-trained transformer which is a lot to say but there's the GPT in chat GPT and essentially this is a neural network that's been trained on large volumes of textual information that gives us the interactive chat feature that we have in the class and we all have more generally in chatbt itself. So an example of what is actually happening underneath the hood of these GPTs. Well, here's a paragraph that up until recent years was kind of a hard paragraph to end with the dot dot dot. Uh, Massachusetts is a state in the New England region of the northeastern United States. It borders on the Atlantic Ocean to the east. The state's capital is dot dot dot. Now, most anyone living in Massachusetts probably knows that answer. But if this AI has just been trained on lots and lots of data, there's probably a lot of people who say Massachusetts in part of a sentence and then the answer, which I won't say yet, is in uh the other part of the sentence. But in this example, given that the question we're asking is sort of so far from some of the useful keywords up until recently, this was a hard problem to solve because there was so much distance. Moreover, there's these nouns that are being used to substitute for the proper noun. Like we suddenly start calling it a state, we call it a state down here. And it wasn't necessarily obvious to AIS that we're talking about the same thing as if it were just city, state, where you'd have much more proximity. So in a nutshell, what we now do especially to solve problems like these is we first break down a sentence or the training data or input alike into like an array or a list of the words themselves. We come up with a representation of each of these words. For instance, the word Massachusetts if you encode it in a certain way uh is going to be represented with an array or vector of numbers, floatingoint values. So many so that the word Massachusetts in one model would use these 1536 floatingoint numbers to represent Massachusetts essentially in an n-dimensional space. So not just an XY plane but somewhere sort of virtually out there and then and this has been the key to these GPTs an attention is calculated based on all of that data whereby in this picture the thicker lines imply more of a relationship between those two words. So Massachusetts and state is inferred as having a thicker line, a higher attention from one word to the other. Whereas our A's and our ises and our thus have thinner lines because they're just not as much signal to the AI as to what the answer to this question is. Meanwhile, when you then feed that sentence like the state's capital is one word per neuron here, the goal is to get the answer to that question. And even here, this is way smaller of a representation than the actual neural network would be. But in effect, all these LLMs, large language models are are just statistical models. Like what is the highest probability word that it should spit out at the end of this paragraph based on all of the Reddit posts and Google search results and encyclopedias and Wikipedias that it's found and trained on online? Well, the answer hopefully will be Boston. But of course, 1% of the time, maybe less than that, the answer might not be correct. And even CS50's own duck is fallible, even though we've written lots of code to try to put downward pressure on those mistakes. And those mistakes are what we'll call lastly hallucinations where the AI just makes something up perhaps because some crazy human on the internet made something up and it was interpreted as authoritative or just by bad luck because of a bit of that exploration 10% of the time 1% of the time the AI sort of veered this way in the large language model in the neural network and spit out an answer that just in fact is not correct. And so I thought I'd end for today on this final note, a poem with which many of us might have grown up from Shell Silverstein here about the homework machine, which years ago somehow sort of predicted the state we would be in with these AI machines. He said, "The homework machine, oh, the homework machine, most perfect contraption that's ever been seen. Just put in your homework, then drop in a dime, snap on the switch, and in 10 seconds time, your homework comes out quick and clean as can be." Here it is. 9 + 4, and the answer is three. Three. Oh, me. I guess it's not as perfect as I thought it would be. This then was CS50. See you next time. [applause] [music] >> [music] [music] >> Heat. Heat. >> [music] >> Heat. Heat. >> [music] [music] [music] >> All right, this is CS50 and this is already week 8. uh and up until now of course in so many of our problem sets like we've been writing command line code like a black and white terminal window and everything is very keyboard based very textual but of course like the apps that you and I are using like every day are in the form of a web browser and on our phone and so today and really for the rest of the semester we now transition to using all of the building blocks that we've been accumulating over the past few weeks but to redeploy them in the context of web apps and for your final project for instance if you so choose even mobile apps as well. So today we're going to understand how the internet that we use every day actually works. We're going to introduce you to a language called HTML which is the language in which web pages are written. A language called CSS which is the language with which web pages are stylized. And then lastly JavaScript which of those is the only actual programming language but even though we'll spend uh quite little time on it you'll see syntactically and functionally it's very similar to C to Python and languages indeed that have come before. All right. So we use the internet every day. So what exactly is it? Well, in the simplest form, like we've got networks in the world and networks are interconnections of computers, whether with wires or wirelessly. You have a network at home nowadays for the most part. You certainly have a network on a campus like this. In corporations, you have networks. So interconnections of computers. As soon as you start networking the networks, if not networking the networks of networks, you have in effect the internet. So this global interconnection of computers, servers, devices and so many other things literally nowadays that we take for granted every day. But how does it actually work and where did it come from? Well, if we rewind to like 1969, the internet in its original form really something known as ARPANet for the advanced research projects agency, a project from the Department of Defense that was really designed to interconnect what limited supercomputers we had back then that were otherwise geographically inaccessible to so many researchers and others. The internet or ARPANET really just looked like this with UCLA and just a few other nodes so to speak interconnected somehow. Uh just a year or so later did we have Harvard and MIT and others on the east coast. And if we fast forward now to today of course we can find and route data most anywhere in the world. And in fact the world is now filled with these things called routers. A router is just a computer a server uh that routes data up down left right geographically. And of course in the real world it might go out this wire here, out this wire here, out this wire or out this wire. And in fact, just to make more real what we're about to be talking about when we talk about networks of computers and eventually the internet, um we engaged some of our teaching fellows over the past few years to perform a a little skit of sorts for us using uh Zoom, if you will, whereby each of the teaching fellows or humans you're about to see consider them as representing a router, a device on the internet whose purpose in life is to route data. And what they're routing is what we're going to start calling packets. packets of information which metaphorically you can think of as just like a little white envelope like this that we use to send things via snail mail via the US Postal Service or beyond that internationally. So I give you in just 60 seconds or so what it means to send a packet on the internet for instance from Phyllis in the bottom right hand corner to a familiar face Brian at top left. If we could dim the lights if only to be dramatic. Heat. Heat. [music] [music] >> [music] [music] >> Thank you. Sure, we can clap for that. And we actually should clap for that because you're seeing the sort of final version which looked kind of perfect, but they were all smiling and clapping because it took us so many damn takes to like actually get the coordination of that correct. But for now, assume that it was in fact correct. But notice what's among the takeaways from even that little skid is that the packet, the envelope from Phyllis to Brian could have taken any number of paths. It could have gone up and then to the left. It could have gone left and then up. It could have zigzagged and the like. And that's actually representative of how the world now looks because of so many wires and so many wireless connections. There's actually a lot of ways that data can travel from point A to point B. And it turns out it's not even necessarily going to be the shortest difference. It might be the least expensive dis uh distance uh or perhaps just the result of how some humans or somehow some servers have automatically configured the d the uh routes to get from point A to point B. So let's consider how the data is actually getting there. So long story short, all of those routers and indeed all devices on the internet including the ones in your pocket or on your laps speak a language, more technically a protocol nowadays known as TCP IP. And this is actually a pair of protocols which is a set of conventions that governs how computers behave on the internet. In the human world, we have protocols as well. For instance, when I meet someone for the first time, I very often instinctively sort of extend my hand just sort of hoping that they too will extend their hand and shake. And that's a human protocol in that it governs how to people in that case intercommunicate. Well, servers have the same kinds of protocols, but it's all textbased or bit based instead of of course physical. But TCP and e and IP are two different protocols that solve two different problems. And let's focus on the last of them first. So IP short for internet protocol is simply a protocol that decides to give all of us a unique address in the world. In other words, there are these things called IP addresses. It's a numeric address that literally every computer in the world has in order to uniquely identify it. Case in point, in the real world, we have addresses too. For instance, in this building here, Memorial Hall, we're at 45 Quincy Street, Cambridge, Massachusetts 02138 USA. And theoretically that unique identifier should get an envelope in the physical world to this location from any other in the real world. IP as applied to the internet just means that similarly do devices, Macs, PCs, phones, and everything else on the internet have a unique identifier as well known as an IP address. It's a number, but it's typically formatted in dotted decimal notation, so to speak. So it's something dot something dot something dot something. And just as a bit of trivia, each of these number signs represents a value from 0 to 255. So there are four such values apparently. And just doing some quick week zero math, if each of those values can be 0 to 255, how many bits is an IP address presumably? >> So eight bits per number. And how many was this? >> So 32 bits because if you're counting from 0 to 255, well that's 256 total possibilities. That's two to the eth which means 8 bits. 8 bits. 8 bits. 8 bits. So IP addresses are 32 bits. Little trivia that's germanine only in so far as it does kind of limit how many total devices we could seem to have in the world. If you've got only 32 bits, how high can you count? Roughly >> two. >> So two to the 32nd power, which we've generally ballparked as 4 billion, which is to say you can have 4 billion devices total, it would seem on the internet, which is a big number. But there's also a lot of humans nowadays. is and odds are most everyone in this room has at least two devices to their name. Maybe a phone and a laptop with which you're taking the course. Maybe even more devices thanks to the internet of things like smart home devices. We have so many IP addresses being assigned to things. So long story short, the world is gradually transitioning from this version here, IPv4, uh to IPv6, which instead of using 32 bits is actually using 128 bits, which is crazy large and gives us more than enough IP addresses for the foreseeable future. To be fair, we've been talking about this for like 20, 30 years, transitioning from V4 to V6, and it's still gradually in motion. But for simplicity in the class and in general, we'll still use IPv4, if only because it's a little easier to wrap your mind around. Now, this is admittedly a pretty arcane diagram. But this is the diagram, ASI art, if you will, that's in the U official specification of what we mean by an IP datagramgram. More colloquially, this is what a packet actually looks like. Now, what are we looking at? Well, you're just looking at like a grid of bits. So this here represents 32 bits total where this is bit zero and that's bit 31 zero indexed all the way over there. And then each row represents 32 more bits. 32 more bits. 32 more bits. Which is to say anytime a computer like Phyllis sends an envelope of information on the internet. It contains at least this information. A whole bunch of bits broken down into bytes. Now, the only ones we'll really care about today are this one here, source address, which is to say when Phyllis sends that packet, she writes her source address, her IP address, something on the outside of the envelope, so to speak. And she also puts Brian's IP address, whatever that is, something else something else on the outside of the envelope as well. There's a whole bunch of other bits involved which are useful, but we'll wave our hands at those for today. But that really speaks to what's actually happening. And if we do this metaphorically in the real world, it's kind of like taking out that envelope. And for instance, if Brian's IP address is 1.23.4 for the sake of discussion, Phyllis in advance of our filming that bit would have written something like 1.23.4 in the middle of the envelope, just like we would in the real world. But presumably, she wants Brian to be able to reply to acknowledge receipt or send his own message. So, she's also going to put her IP address, for instance, in the top left corner of the envelope, 5.67.7.8 for the sake of discussion, so that Brian knows when he writes out his own packet of information how to actually or to whom to reply. But at the end of the day, it's all just bits uh being sent in a specific pattern and there is formal documentation is the the order in which all of those bits will actually be sent out on the wire or wirelessly. So in short, IP ensures that all of us have unique IP addresses via which data can go from us or to us. But that's only one problem. Nowadays, of course, servers can do so many other things. They can do email and chat and video conferencing, game servers, and who knows what. And it would be nice if a single server certainly could do multiple things. And in fact, that's very much the case. Single servers nowadays, and a server is just a term of art for a computer used to serve information to other people. By contrast, our laptops, our desktops are generally clients because they only serve one of us, not multiple people. But these are just uh terms of art. We're describing at the end of the day still computers. IP only ensures that we can uniquely address computers on the internet. But there's another protocol in TCPIP, namely the TCP portion that allows computers to uniquely identify services that they're offering uh to the rest of the world. So for instance, TCP allows it allows a computer to distinguish whether it has received a packet that's an email or receive a packet that's a chat message or a piece of a video conference or the like, which is to say there's more than just IP addresses on the outside of these envelopes. There are also what are called port numbers as well. Uh similarly, numeric uh numeric values that are usually in the range of like 0 to one uh zero on up in the low thousands and they're standardized. For instance, if you are requesting a web page using http slash with which all of us are presumably familiar, unbeknownst to you, on the outside of the virtual envelope that your computer subsequently sends is the port number 80. Because when the server receives that, it knows, oh, this human is requesting a web page and not, for instance, their email or something else. or nowadays if you're using HTTPS where the S denotes secure in the URL you're actually using port 443 which is just an arbitrary number that a bunch of humans in a room decided on years ago to standardize what goes on the outside of an envelope. So just to be more clear then when Phyllis is sending a request to Brian and if Phyllis for instance is the client just a human using a computer and Brian in this story is now a web server better yet a secure web server that's somehow encrypting or scrambling the information to keep it secure well on the outside of this envelope after Brian's IP address which was 1.2.3.4 four. Phyllis is also going to write the number 443 so that when Brian receives and opens this envelope, he knows what he's looking at. A request for a web page and not an email or a chat message or something else. Moreover, we can continue the story just a little bit further. Phyllis also writes on the envelope not only her IP address 5.67.8, but some number as well in that top lefthand corner, whatever it happens to be, which is a port number via which Brian can reply to her. In this way, Phyllis can in effect have multiple tabs open, be using Zoom and uh some chat software or something else, running multiple programs on her computer, and the internet packets are all coming in, but her computer knows to which tabs or applications those packets belong. So, if you really want to geek out, here's what this thing looks like. This is just the sequencing of bits for TCP as well, which is to say, in addition to the dozens of bits we looked at a moment ago that standardize what IP is putting on the outside of the envelope, TCP is adding uh 16 bits that specify a port number, which means you can indeed have tens of thousands of possible port numbers, a destination port number, and a bunch of other stuff, including this so-called sequence number, which happens to be a 32bit value, which is actually pretty important because quite often when sending messages on the internet, they're pretty large. And it would be nice if one person downloading a big image or one person downloading a movie or streaming a movie doesn't mean that no one else on the internet can do something else at that moment in time. So for the sake of discussion, suppose that this very happy cat here is a very large JPEG, for instance, a very large graphical file. It would be nice, let's say, that if Phyllis is trying to send or receive an image as large as this, it's not just in one massive envelope that's going to prevent a whole bunch of other users from similarly using the internet at that moment in time. So, at the risk of a a bit of heresy, we can actually tear this cat in half and fragment it really. And then inside of Phyllis's envelope or equivalently Brian's reply depending on where this cat is coming from or going to part of that cat can go in this envelope. And now say in the bottom left hand corner of this envelope, Phyllis or Brian could write the sequence number in question. One out of four, two out of four, three out of four, four out of four. So that when this and hopefully the other packets arrive at their destination, the recipient's computer can check, okay, this was a really big file in this case. Do I have all of the parts? Yes, it can be inferred from the so-called sequence number which we've represented there in that memo field of the envelope. There's a bunch of other stuff that can go on here too, including prioritization of data as well. Um, but ultimately TCP just allows servers to handle multiple types of services and also allows it to receive data reliably because if for instance a recipient only gets two out of the four packets or three out of the four packets, the fact that there's a sequence number involved is enough information for that recipient to say to the sender, hey, I'm missing one or two or three or more packets. Please resend them. So in short, TCP guarantees delivery by just doing some bookkeeping on the outside of these envelopes. So in short, IP allows us to uniquely identify computers and TCP guarantees delivery and allows us to multiplex so to speak among multiple services on the same device. Questions on the uh this jargon thus far because today's filled with acronyms unfortunately. questions on IP, TCP or anything else. Okay, so seeing none, uh, as promised, let's do yet another acronym. So, it would be pretty tedious if Phyllis and Brian and all of us humans had to write actually IP addresses into our browsers when visiting websites. Uh, and in fact, most of us never do that. Instead, we go to google.com or Harvard.edu edu or actual domain name so to speak which were so much easier for us humans to remember than these arbitrary IP addresses that are either automatically assigned to computers or manually configured uh by humans configuring servers but there's another acronym in the world and there's another technology used on the internet namely DNS for domain name system and this is just a certain type of server that every home has if even if you didn't know it every uh campus has every company has there's so many DNS servers around the world but their purpose in life quite simply is to translate what you and I know as domain names like google.com, harvard.edu and the like into their corresponding IP addresses. And so in short, inside of these DNS servers are essentially like a two column table or spreadsheet, however you want to think about it, whereby here's all of the domain names in the world. Here are all of the corresponding IP addresses in the world. And so when your Mac or PC or phone being used by you is trying to access google.com or harbor.edu, edu. That device certainly when it's first booted up has no idea what IP address what the IP address is for that server. It's not the case that Apple or Google are pre-installing billions of IP addresses inside of our devices. But your device is smart enough to ask the local network at home on campus or at work. Well, what is the IP address of google.com? What is the IP address of harbor.edu? Then what your Mac, PC or phone actually do upon getting that answer from one of these local DNS servers is it writes the corresponding IP address on the outside of that envelope. So it's a wonderfully useful service that just makes the internet more useful for you and I to use because we can use names instead of IP addresses as well. Um technically these things are called fully qualified domain names. Where do they come from? Well, some of you might actually have your own personal website. You might have gone through this process. It's actually not that hard to get your own domain name. You can go to any number of what are called internet registars and pay them some money and it's essentially a on a rental basis. So you rent a domain name for a year or maybe three or five years at a time and they can automatically bill you. The domain name might be as little as a dollar per year or thousands of dollars per year depending on whether someone has scooped it up and is maybe squatting or the like. But all you do ultimately is pay someone money and they give you the rights to use that domain name. And then what you do technically is you configure some DNS server somewhere in the world to know what the eventual IP address is for your server that's going to serve up your domain names, web pages. And long story short with DNS, I say that you have one in your home and on your work and on your campus because it's a very hierarchical kind of structure. like there is out there somewhere these so-called root servers that essentially know what all the IP addresses are of all of the dotcoms for instance or all of theus or the like but my Mac doesn't know that and so my Mac might actually ask that root server what is that IP address but in ter more efficiently my Mac is better still going to ask the local network first when I'm at home it asks my home DNS server which is built into the little home router that you've got somewhere in there uh or if you're on campus it asks Harvard's DNS server And this whole design is recursive to borrow a term from a few weeks ago in that if my computer doesn't know the answer, what's the IP address for this domain? If Harvard doesn't know the answer, it eventually gets escalated to those so-called root servers, but then cached that is remembered by all of these other DNS servers along the way. So, it's a very elegant hierarchical design, but at the end of the day, it's just doing this. It's a big cheat sheet of domain names to IP addresses, and the server is responding for us. All right, one more acronym. So, how do I know what my MAC's IP address should be? How do I know what my phone's IP address should be? Uh, how do I know what the IP address is of the DNS server of whom I should be asking any of these questions? How do I know the IP address of the router to whom to hand my data off to? Like, there's a lot of assumptions built into the story we've been telling. And the answer is, unfortunately, yet another acronym, DHCP, is the solution to all of those problems. And it wasn't always. You know, back in my day, we used to have to manually type in what our computer's IP address was based on what some human told us it would be. We had to type in our DNS server, type in our router address. But now, uh, now DHCP is just yet another server running in your home network, running on campus, running in your corporate network whose purpose in life is to answer questions of the form, what is my IP address? which is to say when you boot up your Mac, your PC, your phone for the first time, it essentially broadcasts a message like hello world, what's my IP address? And hopefully there's one such DHCP server on that local network wired or wirelessly that will respond based on how Harvard or Comcast or Verizon or someone at home has configured it to tell you what your devices IP address is, what the IP is of your local router, what the IP address is or are of your DNS servers and the like. And so this is why things just work nowadays once you've connected to like a Wi-Fi network or physically plugged in. Dynamic host configuration protocol didn't always exist. Wonderful that it now does. All right, enough sort of outside of the envelope stuff. Everything else today will be a deeper dive inside the inside of this envelope to look at what actually are the messages that we are sending, receiving, how are you structuring the web pages and designing everything that comes back from the server to the client. And let's dive in then to this acronym HTTP which you've been typing for years or seeing for years even though you don't really have to type it anymore because browsers just assume that this is what you want. But HTTP is another protocol, hypertext transfer protocol, whose purpose in life is to request web pages and receive web pages. As a protocol, it just standardizes like what goes inside of that envelope when you're trying to use the web. There are different protocols for email, different protocols for Zoom, different protocols for Discord, and any number of other internet services. We'll focus predominantly today on HTTP, which happens to use ports 80 and 443, among others, as we saw. So let's see what HTTP uh it uh is all about or HTTPS the corresponding secure version thereof. So here is a URL canonical URL in that it has a whole bunch of components. Let's consider what some of the jargon is that we're going to start taking for granted. So if you go to httpswww.agample.com/ you are implicitly requesting the root of that website. root just means the default directory, the default folder if you will. And that's what the yellow highlighted slash here just means like give me the default web page. Technically speaking, what you're going to receive in your browser, unbeknownst to you, is an actual file. By convention, it's a file called index.html, maybe index.htm, or any number of other files. But it would be pretty stupid if we as humans all had to type out the actual file name that we want. So the server by default is just going to return you the root of the website. If though you're inside of a folder or you do actually click on a link that leads you to a file, you might very well have at the end of this domain name a full path as well, which might contain zero or more folder names and zero or more file uh zero or one file names as well. In fact, it could be explicitly file.html orfolder/or/folder/file.html. You've probably seen thousands of these over time, even if you haven't really given it much thought. So we today onward will be creating all of this stuff here but we need to understand what's going on to the left too. So here is the so-called domain name or more properly the fully qualified domain name and it has a few different parts too. So this is technically the domain name as we all refer to it something.com means commercial and that com is more specifically known as a tople domain or tldd. Back in the day there were only a few of these.gov.com.net.org org.edu and a bunch of others. Now, there's hundreds, if not thousands of them. Many of them aren't really used prominently in the wild, but there are some not on that original list, like CS50 uses. IO a lot, which doesn't mean input output. It's actually a two-letter country code that has been uh uh essentially rented to us and anyone else using that same TL because in the English- speakaking world, io actually sounds kind of cool. It's kind of conotes indeed input and output.tv TV is another one that actually belongs to a country but in fact also sounds like uh in English television and so that too has been used as well but in general there are top level domains like these some of them now are full words some of them are two characters denoting they belong to a country they are the sort of top level indeed uh categorization of all of these websites meanwhile many URLs but not all also have something to the left of the domain name known as a host name which technically speaking refers to the name of the server that you're requesting specifically. It doesn't have to be literally one server. www can refer to dozens of hundreds thousands of servers. Indeed, if you go to any popular website like gmail.com or the like. Even though you only have one domain name, somehow or other technologically it is referring to clusters of hundreds or thousands of servers that ensure that they can handle all of the customers that might visit that site. And then lastly, there's this the scheme or the protocol in use specifically. And for our discussion today, it's always going to be HTTPS, which is ideal because it's secure and encrypted somehow. Uh, but it can also be indeed HTTP col. So that's it. Like that's just the jargon with which you should be familiar when it comes to URLs like these. And what we'll be doing today is actually creating content that lives at URLs like that and serving it up to us. But what do the messages ultimately look like that are going inside of these envelopes? what the URLs are doing are just getting us to the right place. But how do we express in some form of code that we want this fileh from this server using encryption in this way? Well, inside of the virtual envelopes that Phyllis was sending to Brian and he would have ultimately sent back to her are messages that look like this. Uh get, post, and a bunch of other verbs, if you will. So, HTTP supports a bunch of operations or verbs, namely get, post, and a few others. And it was in the the first of these that Phyllis would have put inside of her envelope initially in order to get a web page like a cat from Brian. Specifically, inside of the envelope, she would have had a textual message. It's not code per se. There's no functions or loops or variables or anything like that. It's a protocol just in the sense that humans years ago standardized what messages should appear inside of those envelopes if you want to get a web page from a server. So for instance, if Brian in this story is now suddenly harvard.edu, specifically www.har.edu, Phyllis's envelope would have contained a message saying get in all caps slash if she just wants the root or the default page from Brian's server, the version of HTTP that she's using, for instance, version two. And she would also specify just in case Brian is multitasking and serving up websites for different domain names on the same physical box which actual host that she wants and maybe a bunch of other lines as well. And hopefully if all goes well, Brian would have responded with an envelope of his own containing an HTTP response in answer to her HTTP request. And Brian's envelope would have contained a textual message that just confirms what version of HTTP he's using, a status code, which is an arcane number that just indicates in this case that everything is okay. All is well, and he would specify the type of content he's sending back to her in his own envelope because it could be HTML. More on that to later today. It could be a JPEG, it could be a GIF, it could be any number of other file formats. And this is just a hint to Phyllis's browser as to what's going to be inside of that envelope she is getting back within her browser. And then maybe a bunch of other stuff as well. So even though some of these details like these underlying implementation details might visually be new to you if you've never really thought about it, turns out we as aspiring programmers can actually see and and poke around with these building blocks and ultimately today take advantage of them. So you're about to see a program that's called curl which stands for connect URL. It's installed in Linux systems like cs50.dev. It's also comes with Macs and PCs quite frequently or you can easily install it. And essentially it's a headless browser that allows you to pretend to be a browser and grab the response from a server by pretending to send by actually sending the contents of an envelope like this. So for instance, if I want to pretend to be a browser and request harbor.edu, edu. I can type this in my cs50.dev terminal window. And let me go ahead and maximize its size and do the following. curl- i, which specifically is only going to show me the headers, the text that we were just talking about. And it's not going to send any of the contents of Harvard's website. Curl- capital I httpswww.harboard.edu/. So if I were typing this into a browser, I would actually see Harvard's homepage. In this case, I'm just going to see the contents of the envelope as black and white text on the screen. Specifically, only the first few lines, the so-called headers that the server is responding with, just as I claimed Brian would to Phyllis. I hit enter, and there's indeed more lines than I had in my slide, but you can see that everything is in fact 200. Okay, this is a convention. 200 means all is indeed okay. There's a bunch of other information here, including the date and time in which this response came back. Here's that content pipeline text HTML and then some other details and a whole bunch of other information as well. So that's one way of seeing what's going on underneath the hood. Well, what other responses might come back? Well, it turns out that 200, okay, is the best possible outcome, but there's another a bunch of other outcomes that are possible as well. For instance, sometimes you'll get not 200 but 301, which means moved permanently. uh it uh colloquially speaking and what does this mean? Well, if a server responds to a browser with a numeric code of 301, that means that the browser is supposed to go to this location instead. It's sort of like putting a detour sign on the server that says there's nothing for you here. Go over here to this location instead. And now notice in this example, it's telling the user to go to httpsw.har.edu/ do slash that's actually what I typed before so I would not have seen that myself but if I go back to VS Code here and let's run the exact same command but let's try to visit the insecure version of Harvard's website http slash which just means that anyone else on the internet can technically see what it is I am now doing with my browser which might not be desirable enter this time Harvard server does not just tell me 200 okay it actually says 301 move permanently and if I read lower in these lines there indeed is the location to which I should actually go and it's a subtle difference. It's forcing me to go to https instead without actually showing me the contents of Harvard's website. So nowadays you and I don't even have to think about this. You and I are not even in the habit surely of typing http or https col. But the browser is ensuring in this case that you are redirected so to speak automatically to the secure version of that site instead. Now there's other status codes and in fact even if you never realized it before now what numeric code do you essentially you sometimes see on the internet when something goes wrong 404. So 404 is a weirdly public arcane error number error number or status code that just means file not found. And we can simulate this as follows. For instance if I in my terminal window do curl-hwww.har.edu I'll suppose that Harvard has a whole department dedicated to cats, which it does not. But if I hit enter here, you'll see that I get an HTTP24 status code, which just means the website does not in fact exist. And if I visited https/www.har.edu/cats in my browser, I would presumably see some error page that may or may not show me visually 404. But many websites, most websites, for better or for worse, reveal this number. So much so that most everyone in this room is probably familiar with 404, even though its origin is this very low-level arcane status code buried in the HTTP headers inside of envelopes like these. There's a whole bunch of others if you'd like some fun facts. Uh 200 is indeed okay. 301 is moved permanently. There's a bunch of other 300 ones that all relate to go elsewhere. Uh 400 generally means that you as the user have somehow done something wrong or next week as we start writing code that talks to web servers. Maybe your code has done something wrong when requesting a website. 500s are really bad. It means the server is messed up somehow. Either it's not available or the programmer made some bug in their code such that it's crashing with for instance something like an internal server error. Uh, we included 418, which is not actually a thing, but it was a fun uh um sort of April Fool's joke years ago where a bunch of uh humans thought it would be funny to write up a whole specification for what it means for a server to respond with a number of 418. Inside joke, not funny at the moment, but uh it is sort of part of internet lore nowadays. Um we can have a little bit of fun with this, maybe with the at the expense of our dear friends down the road. Um, for years now, someone has been paying for uh the following behavior. Let me go back to V uh VS Code here in my terminal window. Let me do curl- httpsychool.org. Have you ever been ever reply perhaps? Well, let me actually go to httpsafetyschool.org and just for fun, hit enter. Oh my goodness, look at where we are. So, how is this implemented? Well, if I finish what I began over here by just looking at the HTTP headers inside of the envelope my actual browser just sent to safetychool.org for like 20 years, presumably some Harvard alum has been paying the bill to rent this domain name just to have this trick implemented such that 301 move permanently is directing people ever since to yale.edu. There's a bunch of others if you go down the rabbit hole of looking on Reddit and the like Stanford, Berkeley, there's a healthy competition on East Coast and West Coast, but it all boils down to very arcane understanding of how HTTP works, the protocol that governs how data is sent from web browsers to web servers. Now, you can of course use curl for connecting to URLs in the context of something like CS50. You could have been doing stuffing stuff like this all the time though with your actual browser. So, I'm using Chrome here, but most any browser nowadays has the ability to give you developer tools uh natively, which is to say somewhere there should be an a menu option that lets you use developer tools that are conducive to someone who knows a bit of programming to poking around underneath the hood of the browser and see what's going on. For instance, I'm going to go ahead and open up a new window here, and I'm going to rightclick on the background, or I can go to the appropriate menu in Chrome's dot dot dot menu, and I'm going to go to inspect, which pulls up what we're going to call developer tools. I'm doing it incognito mode for reasons we'll see next week. This has the effect of clearing automatically any of my cookies, my browser history, because most anytime I do something with the web browser today, I want to pretend like I'm doing it for the very first time so that the behavior is exactly as we suspect. uh expect. So down here, now that I've opened up the so-called developer tools in Chrome, and they look almost the same in Safari and Edge and a bunch of other browsers as well, I will see a tab called elements, which shows me all of the elements of this web page once it appears, including the so-called HTML code we're about to write. I can see a console where error message might sometimes appear, similar in spirit to the terminal window in VS Code. I can also see the network connections that the browser is making to the server. And that's where I thought we'd start our attention here. Here I have a brand new browser window. I'm clicking on network over here. Um, just to make sure we can see everything without it getting automatically deleted, I've clicked on preserve log and disable cache just so that it behaves exactly as expected. And now let's go up here for the first time in this incognito window and go to http/safetieschool.org. Enter. And you'll see a whole bunch of output including this warning in this particular mode. This is increasingly common nowadays for websites that do not support HTTPS, which this alum hasn't been paying for. Uh you'll get a warning typically that specifies you might not want to do this because the whole world, at least the whole world between you and point B, might know what it is you're uh accessing on the web. I can go ahead and pass through this. In fact, once I do that and click on connect to site, we'll see even more output at the bottom and a whole bunch of output that's kind of overwhelming. Notice at bottom left here, just going to safetychool.org resulted in 61 HTTP requests, in effect, 61 envelopes going back and forth. I'm going to focus though on the ones at the very top here, whereby when we finally click through that warning, and I got back a response from the server, having visited safetieschool.org, here is Chrome's presentation of the same information that curl was showing me in my terminal window. The message that came back was 301 move permanently. The protocol or the verb being used was get. There's some uh mentions of the IP address in question here and a whole bunch of other stuff that we'll wave our hands at for today. So all of this time you can see the same and let's try this with some cats. Let me click on the little ghostbuster symbol to clear everything uh down in the developer tools. Let me zoom out and this time let me go to httpsw.har.edu/cats edu/cats which recall did not exist according to curl. If I hit enter, I do see a web page. It's interesting that Harvard has chosen to fairly arcanely reveal to all visitors 404, which means nothing except in so far as the status code. But if I scrolled through all of the 59 requests that were involved and just displaying this very graphical page and go back to the top, you'll see by clicking on the first row for cats itself that I used get to get it uh that URL/cats in the end and it was indeed 404 not found. So you can sort of have all this fun on your own by just poking underneath the hood of what your browser has been hiding from you all of this time. All right. Any questions now before we dive in? No. All right. Well, that's the network tab. Let's look at some of the others and see how we can start writing the stuff oursel. Let me go to stanford.edu. Enter. A whole bunch of things will fly across the screen, but this time I'm going to go to the elements tab. And what we're about to dive into is an actual language, not a programming language, a markup language called HTML, hypertext markup language, whose purpose in life is just to tell browsers what to display on the screen. So here is all of the so-called HTML that some human or humans or software at Stanford wrote in order to create Stanford's homepage, which as of today looks lovely like this. Uh the interesting thing though about the code that Stanford has written to generate this website is that it's being sent to me as a copy. And this is quite unlike the code we've been writing thus far. Um when you wrote code in Scratch, it was sort of there in the browser and stored on MIT server. When you wrote C code and ran it, it was inside of the code space and not given to any user who might access it. The way the web works though is a little bit different. Inside of those envelopes are literally copies of what's on the server being sent to the browser. And so it's your browser, the so-called client, that's actually reading that code, HTML in this case, top to bottom, left to right, and figuring out how to display it. It's not executed on the server per se. Now, that story is going to change a bit next week when we start using Python to dynamically generate HTML so that we're not writing all of this code by hand after this week, but for now, everything you see was the result of the browser executing code that Stanford wrote. The implication of that is that we can have a bit of fun with these same developer tools. For instance, if I control-click or rightclick on something like the word Stanford in the middle middle of their homepage, choose that same inspect option. What's nice about these developer tools is it's going to jump to the very line of code that created that Stanford brand name in the middle of the web page. And this is a wonderful teaching and learning tool because in the days to come when you're trying to learn more and more HTML, you can literally do this for any website on the internet and understand how it is someone implemented a design for instance that you really like and you can learn from other websites how they've constructed the same. So over here you'll see that the word Stanford is just in the source code of this page in the so-called HTML and you know just for fun I can change it to Harvard. Hit enter and now Stanford's website looks like we've been there um and rather hacked it. Of course, it's not that easy to hack Stanford's website. What have I presumably only done just now? I've changed my local copy of that particular website. So, if I just click on the reload icon, I'll actually see that Stanford's website, for better, for worse, still looks like that. But this speaks to now the control that we have within our browser to actually manipulate and learn from what it is that's going on underneath the hood. So, let's dive into this language called HTML, hypertext markup language. It's not a programming language, which means we're going to fly through it even quicker than usual because it really just contains some basic building blocks that do have some interesting intellectual design under them, but for the most part, it becomes an exercise ultimately and just like looking up other tags that exist, read the documentation and figure out how you can use them to do other features in websites. So, let's take a look at perhaps the simplest of webpage and specifically glean from them what tags are and what attributes are. really the only two terms of art that are going to be generained for this particular language. No loops, no conditionals, no variables, no complexity really other than basic building blocks like these. So here is HTML for the simplest of websites. This is like a mini version of what Stanford's uh team presumably wrote on their server, but it's only like a dozen lines of code instead of hundreds or thousands, however long that website was. Any web page written today, assuming it's using the latest version of HTML, which happens to be version five as of today, uh begins with code that looks like this. This kind of code will presumably be stored in a file called file.html, uh index.html, Stanford.html, whatever the file is actually named. This is simply what's going to be inside of the contents. You could save this file on your own Mac, open it up, and your browser would open it, but you're going to be the only one in the world that can actually see the contents of that web page if it's just on your Mac or just on your PC. So, we of course are going to be writing HTML on a server so that not just you, but in theory, especially for your final project, anyone on the world with an internet connection can access the same. So, we within the context of CS50.dev dev are going to start using this new command HTTP server whose purpose in life is just to serve up files via HTTP. Now, there's kind of an interesting design going on here because if we use ht if we use uh cs50.dev, otherwise known as GitHub code spaces, there's already a web server running on that website because when you go to cs50.dev dev and log in and get redirected some longer URL. You're using a web application aka VS Code that allows you to write code in the cloud. Now, that application by default is running on port 80 and 443. So, it doesn't matter if you start at HTTP or HTTPS, both will work. But that means that your code that we write today and you write for the next problem set or for your final project can't live at port 80 or port 443 because GitHub, the company that hosts this, is already using those default standard ports. But we can use any number of other port numbers. I claimed earlier there's tens of thousands of numbers that we could use. So that's what we're actually going to do. So let me go back to VS Code here. Let me shrink down my terminal window. Let me create a first file today called for instance uh hello.html. Enter. And now I've got an empty tab as usual. I'm going to very quickly whip up the exact same contents that we just saw. So an angled bracket, an exclamation point, dock type HTML, then open bracket HTML, close bracket, and notice the autocomplete kicked in for this particular language. So I don't have to type everything myself. Inside of this tag, so to speak, I'm now going to put a head tag inside of which is going to be a title tag. I'm going to say something like hello title just to be quick. And then down here below those lines, I'm going to put a so-called body tag inside of which is hello body just for some quick text. And that's it. This is now a file inside of my code space. And there's no command to just compile or run this in the terminal because the goal is going to be to open this HTML file with a browser. If I want to do that in another browser tab, I need to tell code my code space to serve that file via HTTP. So, the simplest way to do this is as follows, http-server enter. You're going to see a whole bunch of text on the screen. You're going to see a green button hopefully pop up that says open in browser, which is going to allow you to open up, and I'll zoom in the contents of the current folder with a web browser. My URL has changed to be different from what it was a moment ago. I came in advance today with my own folder of code like we usually do. Source 8, which contains all of today's pre-made examples. But here is the file I just created a moment ago. And if I click on that hello.html, what we're looking at at the moment is just a directory listing, a directory index of all of the files in my code right now, I see the simplest of web pages. It's a little underwhelming, but clearly here's hello body, which takes up like 95% of the screen, the so-called viewport, which is just a big rectangular region of the screen, but there's the title in the tab up there. So, if you've ever wondered or cared like where does the content in a web page come from, well, here's the body content. Here's the head or the title content. And then everything else is just sort of icing on the cake. So, I've written at this point a file called hello.html. it has yielded this effect of having something in the head uh in the uh the head of the page and the body. But let's actually tease apart what just happened. So at the start of any file written in this language called HTML, the latest version thereof, five, it literally just starts with this. And this is just the kind of thing you memorize or copy paste. Uh open bracket exclamation point dot type HTML close bracket over there. It looks a little bit different because we're not going to use for the most part the exclamation point syntax anywhere else unless we're using an HTML comment. So HTML has comments just like Python, C and other languages. But let's focus really on this juicier part. Here we have what's known as an uh an element in HTML. An element includes a start tag and an end tag or equivalently an open tag and a close tag. So here for instance is syntax that essentially is going to tell the browser when my browser reads this file top to bottom left to right hey browser here comes the HTML of my page and the language in which the contents of this page are written are in English. So HTML all lowercase is the name of the tag so to speak and equivalently the name of the element. Lang is what's going to be called an attribute which just modifies the default behavior of the uh element and quote unquote en is the value thereof which is the shorthand notation for English and their shorthand notations for most every human language as well. So you have a tag name and an attribute with a value. And we've seen these things so many times. These key value pairs in the context of dictionaries or hashts or any number of other contexts. Key value pairs in HTML are separated by an equal sign with the value typically quoted in this way. Double quotes or single quotes but being consistent. Then notice at the end of this file as per the indentation, there's something symmetrically down here that has the effect of closing the tag or ending the tag. And this effectively tells the browser, "Hey browser, that's it for my HTML." Meanwhile, everything else follows the similar paradigm inside of those two tags. Here is a head tag that says, "Hey browser, here comes the head of my page. Hey browser, that's it for the head of the page. Hey browser, inside of the head, here comes the title, that's it for the title. Well, what is the title? Hello, title." Just as I wrote in my code space. Same story for body. Hey browser, here comes the body of the page. The 95% of the screen, that's it for the body. But what's in the body is exactly that. The indentation is nice and pretty printed. I've used four spaces as we commonly do. Not strictly necessary. In fact, in my own code space, I didn't even bother putting these on three separate lines. I just did one line. That's fine because as we'll see, browsers typically ignore whites space. Uh but I've done it there as we often do just to ensure that things are pretty printed and therefore readable by us humans. Let me call your attention to one other thing on the screen. Up until now, before every lecture, I've been hiding a whole bunch of tabs in my terminal window. But today, I left enabled one that you've probably seen but not cared about before, namely ports. And it's under this ports tab that you can actually see a real incarnation of a TCP port. By default, when you run the command HTTP server, it serves up my current folders content on its own web server, its own HTTP server, but not using the default port 80 or 443 because GitHub is already using those on CS50.dev and their product. But by default, we've chosen another common developer port number 8080, which is interesting only in so far as it's 80 twice, but it's a human convention, but it could have been any number of thousands of other possibilities. But this line here is just telling me that I am some apparently running a server on port 8080. And if I click on there too, I can manually open the same tab. But that's what the green button was doing for me. It was informing me, hey, you've just started a web server on this port. Do you want to open a new tab with the contents thereof? So this is the picture we're now painting. Let me pull back up the code that we just wrote and let me propose that what we've really done is built a tree in the browser's memory. So we kind of have come full circle with week five when we talked about trees and other hierarchical structures. If we assume that the document can be represented with a node that looks a bit like an oval up here that just represents the whole contents of the file. Well, it starts with a single root element by convention, the HTML element. And your page can have only one of those elements. But the HTML tag inside of it can be a head tag and a body tag. And in this case, the head tag, recall, had a title tag as well as the actual text thereof, which was hello title. Meanwhile, the body had just the text thereof as well. And so when I keep saying that the browser is downloading the file, for instance, hello.html, reading it top to bottom, left to right. It's doing literally that, but somehow or other, it's using Maloc or whatever language it's written in to allocate node, node, node, node, node, and populating that tree in your browser's memory or RAM, a data structure quite like that. So, it's all sort of gerine to where we've been before. Before now, we take I think a snack, are there any questions about what we've just seen? anything at all. Shouldn't have prefaced this with the only thing between us is uh these questions and snacks. No. All right, snack time. All right, see you in 10. Snacks. All right, so we are back and pretty much everything we do here on out will look structurally like this. And we're just going to introduce a few more tags and a few more attributes to give you a sense of some of the basic building blocks of most any website out there. And you'll find pretty quickly that it starts to get kind of tedious writing it out. In fact, I will resort to some copy paste today just to kind of speed things up. But this is going to motivate indeed next week when we reintroduce Python as well as SQL to actually auto automate generation of HTML as well. So all of today's websites and many of today's mobile apps are written in HTML. But people are in decreasingly writing this kind of stuff by hand. Rather they are writing code that generates precisely what we're going to learn. So understanding the fundamentals will still be useful so we know what code to write next week and beyond. So let me go back into VS Code here. And what I'm going to go ahead and do is open up another terminal window so that I can leave HTTP server running in this first terminal window. And what I'm going to go ahead and propose that we do is implement a web page that has not just a single line of text, but maybe some paragraphs. So I'm going to call this paragraphs.html. That's going to open up a new tab. And here's where I'm going to save some time. I'm going to go back to hello.html HTML and just highlight all and copy paste this as the beginning of this file. But what I'll start doing is just changing the title of each page to match the file name. So this is going to be my paragraphs example. And instead of saying just hello body, let's actually have a few paragraphs of text. Um I'd rather not waste time writing even full paragraphs of text. So let's actually open up the doc and let's log in and for instance just ask it for a help quick helping hand here. Write three paragraphs about uh computer science. don't really care what the output is. All I want is some dynamically generated text to save me some keystrokes. And here we have an educational answer there, too. Even though all we really care about today is the fact that this is three chunks of text. Hopefully, that's all quite accurate. All right, I'm going to go ahead and highlight all of that. Go back into my paragraphs.html tab. Paste it inside of the body. It's so long, the paragraphs, that the text scrolls. I can at least clean this up slightly. I'm going to go ahead and just indent it twice just so that at least it's pretty printed inside of the body. And now I'm going to go back to my other tab which represents the contents of hello.html. I'm going to click back which is going to show me that same directory listing again which now has a new file paragraphs.html and I'm going to click it so as to see these three paragraphs of text. What looks wrong? Yeah, >> paragraphs. >> There's no paragraphs. It's just one big blob of text. It's the same text, but buried in there is the end of the first paragraph and the start of the next, and same for the third. So, what's going on? Well, appropo of my comment earlier about browsers not really caring about whites space, you can put all the white space you want there. It's just going to ignore it in this particular case. All it's going to give me minimally is a single space between each of these paragraphs of text. So, HTML is very pedantic. Like, if you want there to be more paragraphs, you need to tell the browser, put a paragraph here, put a paragraph there. And the way to do this thankfully isn't all that hard. I'm going to go inside of the body here and I'm going to simply open a tag called open uh P for paragraph for short. Notice that VS Code in this particular case is a little annoying because it's trying to finish my thought, but it doesn't know that I already wrote this text. So, I'm just going to delete what it automatically generated. And then I'm going to manually indent this. And I'm going to do the same thing again for the other paragraphs. Up here, I'm going to open the paragraph tag. I'm going to delete temporarily the close tag so that I can actually put it below that chunk of text here. Indent this and then down here. And this would have been easier if I just did it right the first time. I'm going to do the same thing with the third and final paragraph. So now what we in effect have three times in a row is hey browser here comes a paragraph then the first paragraph. Hey browser that's it for the paragraph. Hey browser here comes a paragraph that's it for the paragraph. Hey browser comes a paragraph. So, three times in total with open, close, open, close, open, close. Now, if I go back to the browser, nothing appears to have changed yet, but that's cuz I'm looking at a copy that was downloaded a moment ago in that virtual envelope. So, this is why, among other reasons, we hit reload on web pages to get the latest version. And voila, now we have three actual paragraphs. Um, the white space is inserted automatically by the browser, but it's at least prettier to the eye now. So, that then is the paragraph tag. So, useful, of course, if we have paragraphs of text. What are some other tags we might introduce? Well, maybe you're writing a paper or a blog post or the like. It's pretty typical to want headings of sections of the page. Maybe chapters and then sections and then subsections or the like. HTML can help with this too. So, let me go into my terminal window again, create a file called how about uh let's call it headings.html. And then in this file, let me similarly go back to hello.html, copy paste it into headings. I'm going to close paragraphs because we're done with that. And I'm just going to change the title now to headings. And inside of the body here, what I'm going to go ahead and do is uh you know, it would have been nice to have some of that same text. Let's let me go back one step. Let me grab the paragraphs and paste that into this new file. Let me rename it to headings to make clear which file we're in. And now let me go ahead and propose that wouldn't it be nice if I made clear that this is the first paragraph. So I'm going to use the H1 tag, which is the heading one tag. And I'm just going to say one for the sake of discussion. And down here, I'm going to say H2 and say two for the sake of discussion. And down here, H3 3 because I don't really care what these things are called. Just want to demonstrate the functionality. If I go back to my other tab now, back to the directory listing, there's my brand new file headings.html. And it's the same paragraphs, but now you have some big bold text that looks reminiscent of the chapter heading, the section heading, the subsection heading, and the like. Or that you might see on a news site or a blog site or the like. So you've got H1 through H6 from biggest and boldest to uh smaller but still bold. And the browser decides on all of those settings for us. But it also makes some semantic clarity to me that probably the most important thing on the page at least to begin with is that H1 tag and then everything else is like supporting paragraphs or arguments or whatever the case might be. There's a hierarchy implicit there. All right. What are some other things we can do with web pages? Well, let me open my terminal window again and why don't we code up how about a list of values cuz lists are everywhere on the internet. So, let me open up list.html and then close my terminal. Uh, I'll go ahead and start with that same file, headings.html, paste it into list, change the name here. Let's delete everything I did. And again, the only reason I'm copying and pasting is just to avoid writing out the same boilerplate code again and again with the HTML tag, head tag, body tag, and so forth. Let's focus on the new stuff. The new stuff in this example will be a list of values like the words fu, bar, and baz, which much like a mathematician might go with xyz as placeholders, computer scientists would typically reach for words like fu, bar, and baz when nonsensical placeholders. And this looks like a list of three values, one after the other. Of course, if I go back into my directory index, click on list, how many list items am I going to see per line? Yeah. Well, it's going to be just one big blob of text here, too. It doesn't matter if it looks like a list. It is just going to be text after text after text separated by a single space, not the multiple lines I had. So, here too, we've got to be pretty pedantic. If I want a list of values, I need to use a tag that conveys that. And the tag I'll use first is going to be ul for unordered list, which gives me a bulleted list. And then inside of this unordered list, I claim we're going to have a whole bunch of list items or li for short. uh like fu, like bar, like baz or any other things that you want to put in your list. If I now go back to my other tab, reload, now you get the familiar bulleted lists that you might see in any number of websites, Google Docs or the like. How does Google Docs do it underneath the hood? Well, they're just using a UL tag and some LI tags inside of that to give you the bulleted list that's just happening automatically when you click the appropriate button in something like Google Docs, which at the end of the day is just a website. Well, what if I want to number these things? Well, if I go back to VS Code, I could certainly just start numbering them like 1 2 3, which is fine, but honestly, like computers can count and with loops pretty quickly. Also, it's a little annoying. If I want to go back in later and insert something between some of those elements, I then have to reumber everything manually. I mean, this is one of the things computers are good at. So, take a guess. If I want not an unordered list, but an ordered list that is numbered, what might you change? Yes, O is a good bet. Let's change both the open tag and the close tag. Let me go back to this uh my second tab. Reload. And now we have it. Uh one, two, and three. And you can actually use a whole table of contents. You can use uh sub bullets or subning. Anything you can do in like a table of contents, HTML can do for you automatically here. Well, what about tabular data? Laying out data in kind of rows and columns. Well, we can do that, too. Let me go ahead and open up a new file. Uh how about table.html. HTML. Let me go ahead then in this file, copy paste as before, just so I have some boilerplate. Let's get rid of everything in the body. And then let's just manually whip up a little table like this. Open bracket table. Inside of the table tags, I'm going to have a TR tag for table row. Inside of this table row, I'm going to have a table data tag, which is going to have the number one. I'm going to give myself another two, another three. Outside of the table row, I'm gonna have another table row. And I'm gonna create maybe four. And now I'm going to do five. And now I'm gonna do six. And you can perhaps see where this is going. After this, I'm going to do one more table row. How about a little tediously? Seven. How about eight? How about nine? And then lastly, just to make it look a little familiar, final table row. How about with a TD of an asterisk? And then how about a zero? And lastly, how about a pound symbol? Maybe. Any guesses as to what we're making in HTML here? Like a telephone keypad. Yeah. So, let's go back over to Let me close the old file. Back over to the browser. Click back. There's my new file, table.html. And it's not going to be very pretty, but I dare say that's exactly what you see when you pull up the phone app and you start dialing a number. It's sort of a numeric keypad laid out automatically for me in rows and columns. Now, this one's a little underwhelming. Let me open up a file that I made in advance of class today. Um, in my favorites uh file here, I'm going to go ahead and copy a pre-made example. I'm going to open up this file called favorites0.html. And what you'll see here is a slightly more complicated table, still with a table tag, but this time with a t head tag for table head and then a tbody tag inside of which are all of those rows. And I know this just by having read the documentation. And then notice this. Inside of the first TR in the T head, there are three TH's, table headings, timestamp, language, and problem, which might sound a little familiar when we last collected data from everyone via that Google form. Well, let's go ahead and spoil what this is. Let me go back to the directory index. There is this pre-made file, favorites.html, and arguably a more compelling use of a table. Now, we have an HTML table containing all of the form submissions that you all clicked in with the other day when we were asking you your favorite language and your favorite problem. It's not super pretty, but indeed it's in rows and columns. And so, it's reminiscent of the HTML that Google is using in the actual Google Sheets software to lay out a sheet of data for you in those same rows and columns. All right. Well, let's do something that's a little more visually interesting. Let me go back to VS Code here. uh close out those first uh those last two. And how about let's do something with images? Well, I brought again uh inside of today's code. Uh how about our same bridge that we keep opening up in class? And this is the week's bridge. Looks a little something Whoops. Uh looks a little something like this. Here though is just the raw image. How could I include an image in a web page that I serve up on the internet? Well, let's go ahead and try this. Let me close the ping itself. Let me copy this and create a new file called how about image.html. Hide my terminal. Copy paste that. Just quickly change the title to image so we know where we are. And inside of the body of this page, let's go and embed that image so that we can include not just the image, but if we want paragraphs of text around it, headings as well. Heck, maybe a table, any other features that we've seen already. I'm going to say img, which is image for short. Source src for short equals quote unquote bridge.png. And then I'm going to close the tag here. Now I'm going to go back to my other tab. Go back into my directory index. Here's my brand new file, image.html. And this too isn't going to look all that different from the actual image because I have no other content. But when I click on this, you'll see that there is the full screen image. And it's even a little too big to fit in my viewport in the body of the page. But we can fix something like that later. I've embedded in this website precisely that image. But I should do a little bit better here. In fact, if the image is slow to load or if someone uh is visually impaired and doesn't know what they're looking at, it would be nice to have some alternative text that something like screen reader software could recite. So, there's another attribute for this tag specifically called alt for alternative. And I can put something like Harvard University to at least give the user a textual description of what kind of photo they're looking at. You'll also see that text if indeed the image is slow to load or if it's broken, like missing altogether, you won't see 404. you'll see like a broken image icon, but at least with some explanatory text as to what the developer intended you to see at that point. It's not going to change at all if I reload here by going back to image.html, but again, a screen reader or an astute viewer would see that ultimately in the browser. But there's something different, and this isn't a mistake for once. What have I done differently, but apparently not wrong? I claim something new or noteworthy about this particular image tag. Yeah. >> Yeah. There's no like close tag. There's no like open bracket/ img which is the pattern we followed for every other tag like closing the HTML tag, the head tag, the body tag and so forth. I just don't see any end tag here. And it's just not necessary. Turns out there are certain HTML tags that can be empty elements, which is to say doesn't make semantic sense to start and end an image. Like it's either there or it's not. And so some tags just don't require an end tag if it's sort of obvious to the browser that the image should go there. So image is one such of those tags. And then I noticed um I'm missing the lang here, which isn't strictly necessary because I've got no textual content, but just for consistency, let me go back and put that in as before. Um, meanwhile, um, the image is exactly as it would appear in the screen, but it doesn't have to be just an image we embed. We can do something with like video. So, let me go ahead and open up a file called video.html. Let me copy paste some of that starter code. Change this to video. And instead of the image tag, as you might imagine, there's also a video tag. It's a little more involved, but per the documentation, I know I can do this video. And then inside of the video tag, I can actually have multiple sources just in case the browser might want different versions or different resolutions, sort of qualities thereof. And this somewhat confusingly is an actual tag called source, not shortened, but stupidly this tag has an attribute called source, which is shortened that equals the name of the file you want to embed. And I came with today's examples, a video file called video.mpp4, which is a small video that you can embed. And I can tell the browser what type of video it is to be clear. And the convention here or content type is to say the type of this video is an MPEG 4 video. There are other features though for the video tag. In fact, in when you see a video on a page, you can very often see like a play icon, a pause icon, maybe some other controls. Well, it turns out you can put an HTML attribute on the video tag literally called controls that will enable those. If you don't turn them on, there's no way to like start and stop the video and or see rather those controls visually. This way, the user actually sees them. But this attribute is a little bit different from others. It doesn't actually need a value. It just has to be present and the browser will know when it sees the word controls, oh, I should turn on the controls feature. And for good measure, especially in today's world of advertisements everywhere, if you want the video to play automatically potentially, uh, or at least not annoy the user, you might want to mute it by default as well. So another attribute per the documentation for the video tag is that you can start the video muted as well. And only when the user clicks on it might you actually start to hear something. But of course these are fairly basic examples of media inside of pages. Let's actually do what the uh H is meant to imply in HTML. The hypertext the ability to link from one page to another. That is a feature we haven't yet seen. So let me go ahead and do this. And let me just for completeness, let me go back into hello.html because I completely forgot the language attribute, even though that's really just there for SEO, search engine optimization, or for tools like Google Translate or the like that know therefore what language they're translating from. Um, let me go into my terminal window here and let's create another file called link.html, which demonstrates exactly that, the ability to link from one web page to another. Uh let's go ahead here and change the title to link so I know where I am. And in the body of this page, let's go ahead and create what's called a hyper reference or hyperlink. Uh I'll encourage people in this page to visit the actual Harvard website. So let's do visit. How about uh Harvard period just to demonstrate where we're beginning. If I go back into this directory index, click on link.html. This, of course, is not yet a link, so I should probably make it one. Well, instead of just saying visit Harvard, maybe I should say harvard.edu. Go back to the other tab. Reload. And it's harvard.edu, but I can click and highlight it, but it's not clickable. It's not underlined like a link. All right. Well, maybe I need to do like www.harboard.edu. Reload. Still nothing happening. All right. Well, maybe I need the full URL in the scheme. https and maybe the slash at the end. Reload again and nothing's happening. So here too, HTML is pedantic. Like it will not create a link for you unless you tell it to create a link. And the fact that when you post on social media nowadays or in Google Docs, things are automatically hyperl for you, like that's a feature implemented in code. Very often, Python or JavaScript or something else where some human wrote code that looks for patterns in the uh input you've typed in and if it looks like you've typed a URL, it will automatically link it for you. But what are those websites doing for you automatically? Well, they're doing this. If you want to have a tag, a link here to Harvard's website, you use open bracket a for anchor, href for hyper reference. Set that equal to the URL to which you want to link. Close the tag and then in between the open tag and the closed tag, put the actual word you want to link to. So now if I go back to this page and reload, now I have what looked like my original attempt, just visit Harvard, but it's a hyperlink. And this is super subtle, but if I hover over that underlined word, which is blue by default, you'll actually see in the browser's bottom lefthand corner where you're going to be whisked away to, even though that's all too subtle, but this now looks like I intended, an actual hyperlink to Harvard. In fact, I could link it to the full URL, but it would be a little redundant. And even though this looks like uh you shouldn't have to do this, this is indeed how HTML works. The href attribute is where you're going to go. The text inside of the open and close tag is what the user will see. So if you want them to see the full URL, you got to put it there. And now I can see the full URL to where I'm being led. But here's where you can actually introduce discussions of like cyber security. How could this feature be abused? Might you think? This stupid simple feature. Yeah. have it display something but actually >> yeah you could have it display one thing but lead to somewhere else and it wouldn't be that hard for the adversary who's maybe tricked you into visiting their web page to say you're actually going to go to yale.edu edu instead of Harvard. But if I reload the page, it doesn't look any different. Unless the viewer is astute enough to look at this tiny little text in the bottom of the screen or just click on the link and be whisked away to the wrong destination. That can be problematic. Like this is a nice haha sort of prank. But you could certainly imagine doing this with like paypal.com addresses or any number of banks or anything where you're trying to collect personal information from someone. And if the resulting website looks quite like the one you're actually creating, uh, it looks quite like the website they're expecting, but it's actually your copy thereof, it's all too easy to wage what are called fishing attacks. P H I S H I N G, which means to lead someone to what looks like the real site, but is not. Typically, to get their username, their password, their credit card information, or something else. But it boils down to just these basic building blocks like this. questions then on any of these building blocks that we've seen thus far. Yeah. >> I think I might have gone lost in the earlier portion. >> Sure. >> How did you um like get get it to open up? Like did you run the file in >> Oh, good question. How did I get it to open up? So, let me rewind. So, the very first thing we did after creating hello.html HTML was open a terminal window and specifically I ran a command which was HTTP server http-server which starts my own web server in my code space but not on the default port 80 and443 because that's what cs50.dev is already using instead it chose by our design 8080 which is commonly used by developers when making websites. Then I just kind of hid my terminal because it's not interesting to see constantly then. But that web server is still running in my code space. And anytime I'm saying let's go back to this tab, I am now visiting a different URL that was the result of my clicking on that green button which led me to my own website. If you ever get lost or close that tab by accident, no big deal. If you go to the ports tab of your terminal, you can actually hover over this and click on that same URL and open up the contents of your own site instead. >> Fluffy meme. Yes, these are randomly generated names by GitHub, which is the company that hosts VS Code in this way. And they do this to ensure uniqueness without it being some arcane sequence of random letters and numbers. They concatenate random English words together. A good question. All right. So, what else can we do here? Well, let me propose that there's a bit more you can do with even these URLs. Here, of course, is the scheme and the host name and the domain and the TLD. But after the URL, things can get a little more interesting than just folder names and file names. In fact, it's quite common to see URLs that have somewhere in them a question mark and then a bunch of other key value pairs which is this omnipresent computer science thing it seems including in the context of URLs whereby if you want to pass a input to a web server one means by which you can do that is literally in the URL itself. So for instance, if you visit google.com and you want to search for something, you and I are all in the habit of course of just typing into a search box. But how is that search box actually getting the data into Google's servers? Well, it's via these URLs. And if there's not one input, but two inputs, the URL might be a bit longer and there might be one or more amperands in the URL that just separate more key value pairs. And it turns out we can see this in the real world as follows. Let me go back to VS Code here. Let me open up a new tab. Uh, and let me open up uh, google.com. And I'm just going to hit enter on the shortest way of saying it. So, I get to Google's home uh, homepage here. Even though notice I ended up at some longer form of the URL. In fact, I'm going to delete everything else from the URL that's not relevant to us today. It's still forcibly coming back. So, Google is somehow trying to track me by putting that in there. That's fine. All I'm going to do is search for cats. Now, there's a whole bunch of other functionality that's clearly happening, like autocomplete, and it's trying to figure out what results or words I might want. I'm just going to go ahead and hit enter. And this is all to say that notice if I zoom in on the URL at the top of my screen, it's a crazy long URL because Google probably is doing a bunch of tracking and advertising and analytics technologically, none of which is relevant to us today. But notice after www.google.com, there's /arch, which is the path on their server, the search program that someone there has written. There's a question mark and then there is an HTTP parameter as these things are called the more precise name for key value pairs in URLs. This is an HTTP parameter. Its value after the equal sign is in fact cats. All this other stuff I have no idea what it is. I'm going to just delete it and hit enter and it stays gone. But I still get cats in my search results. So this I would argue is sort of the canonically shortest form of a Google URL that's useful. In fact, if I want to search for dogs instead, I don't have to use the search box. I can literally manually make my own URL, hit enter, and if I zoom out, there are Google search results about dogs. So, this URL 2 is sort of the essence then of how URLs work. And specifically, the get verb, which was that keyword in all caps that I claimed was inside of the envelope, and it's what Phyllis's browser was sending, and it's what my browser has been sending through all of these examples. But here's where things now can get interesting. If I know how Google's server works, its backend, the part that knows all about cats and dogs on the internet, I can implement my own front end by just knowing a bit of HTML. So, let me actually go back into VS Code here. Let me go uh into my second terminal, which is blank, and let me go ahead and create something called search.html. I'm going to go ahead and copy my original code, close link, and paste it here. Hide my terminal. call this thing search and then inside of the body of this page I'm going to make my own version of Google here. I'm going to use a form tag and I'm going to in that form specify an input tag whose name is going to be exactly equal to what I saw Google uses Q which happens to stand for query. Uh I am then going to add another one input. Uh the type of this button actually let's say the type of this box this input is going to be text. The type of this next one is going to be a submit button. Uh, and then that's it. Let me go back into my other tab. Go back into my directory listing. Click on search.html. And this is not pretty, but it is the beginning of my very own search engine. Unfortunately, if I type in cats, notice what happens. My URL changes such that it's search.html question mark q equals cats. I know nothing about cats. I don't have a database of cats. I haven't done any backend work, just the front end. The front end is what the user sees. The back end is what provides data to the front end. But why don't I tell this form not to submit to me. But let's say that its action should actually be go to go to https www.google.com/arch which is the URL that I saw in my browser. I'm just inferring how Google works. I'm going to be pedantic even though this is the default. I'm going to say the method I want my form to use is get. Confusingly, it should be lowercase here, even though inside of the envelope it will be all caps. And then I'm going to go back to this page. Reload after going back. And you'll see the same exact box, but when I search now for cats, submit, notice my URL changes to Google's own. It's like voila. Like I just implemented my own Google without doing the actual hard part. I've actually just done the more simple front end. And there's a few other things I can do here that are sort of nice. I can change the type to be a search box. I can change the value of my button, not to be the default, which notice was submit. I can say Google search. And I can keep tweaking this to make it even prettier and prettier here. Now in my version is now a box that has uh cats. Notice that it's trying to complete my thought. I can actually go back into the form. I can say autocomplete equals off to turn off that feature. So now if I click in this box and type Oh, autocomplete equals off. Why is it still there? >> Did I forget to refresh? Oh, thank you. I forgot to refresh. Hence my point. So you always have to reload after making a change. And now the autocomplete feature is off. And this other little thing, it's subtle, but this little X that will just clear the whole thing. That is simply the result of having changed text to search for the type of that box. Um, there's other things you can do too for accessibility or user friendliness. I can do auto uh focus here for instance without any attribute or without any value. If I now reload this page, notice that the cursor is automatically blinking in the text box, which is a marginal change, but much easier for me to now type cats without having to stupidly click in the box in order to actually foreground it so I can type input. So, suffice it to say, this is not really the business that Google is in. They do much more on the back end than they do on the front end. But with just these basic building blocks, can I implement the beginnings of the same website? In fact, let me do one other flourish. You'll see that that text box is blank. Not clear what I might want to do. Well, there's another attribute I can use. Placeholder equals something like query. I can at least tell the user what to search for. If I reload again, now I see in gray text query instructions so that I roughly know what now to type. So all these things that you see every day on websites are really as easy as just coding up some HTML like that. But what else can we do with HTML? Well, it turns out this is a topic for another longer day too. There exist in computing what are called regular expressions which is a fancy way of describing patterns which are quite useful when you want to validate input. For instance, if you want the user to have to type in an email address with the at sign with the tldd and so forth, it would be nice to make sure that they get a warning if they try to skip that field or they mistype something in it as well. Um, with the world of regular expressions known in short as reg x's, you have a whole bunch of uh documentation here that in a nutshell will introduce you to some pretty powerful syntax that we won't spend much time on at all today, but it's syntax that exists not only in uh the world of the web, but in Python and so many other languages as well. So consider this just a quick crash course. If you want to define a pattern in say a website that ensures that the user types in a email address, you can use these textual building blocks whereby in the world of regular expressions, a single dot represents any character. If you don't care what the character is, dot confusingly doesn't represent a period, it represents any character. Star represents zero or more times. Uh plus means one or more times. Question mark means zero or one time if you want something to be there or not. curly braces with a number means this many times n and you can even have a range of values instead. And then you can use square brackets and some other syntax to say I want the user to type in any of these characters or digits in this case. Or you can do ranges like this. I want them to type in any decimal digit between 0 and 9 or back slashd represents any digit. Back slash capital d means anything that's not a digit. Long story short, humans over the years have come up with shorthand notation known as regular expressions via which you can define patterns. This is useful because if I wanted to make a web page that does in fact require that someone type in say an email address, I can enforce that to some extent. If I go back to my browser here and into VS Code, let me go ahead and create a new file called say register.html to be representative of registering for some website. I'll change the title here real quick. I'm going to keep the form, but in this case, I'm not going to bother with Google anymore. So, let's make it a bit simpler than before. And let's go ahead and do this. Inside of the form, I'm going to have an input. Uh, I'm going to have the name of this input be email because that's what I'm collecting. I'm going to have a placeholder be quote unquote email so the user know what's to type in. Um, and I'm going to go ahead here and have something like how about uh this a pattern as well. So actually let's say uh let's say type equals text, but I'm going to specify additionally a pattern. So the pattern I want the user to type in in between these quotes is going to be any character one or more times. That is to say their username, then an at sign. then any character one or more times. Uh then literally a period and we didn't see this on the screen but just like in C when you want to escape special characters if you want literally a period in their input as the like the dot in harbor.edu you can say backslash period to mean a literal period and then the word or the uh tld edu. So I think now what this means and let me go ahead and give myself a button and just so you've seen it there's also a button element in HTML which is similar in spirit to the submit button we saw a moment ago. Let me go back to my directory listing go into register.html and let me go ahead and just type in like mail as my name register and you'll see please match the requested format. So I have not satisfied it properly until I actually type in something like [email protected] and now it's happy. Alternatively, it's a little tedious to actually type in these patterns. So, there are some shorthands for them. I can actually get rid of this pattern. And if I read the documentation for HTML, there is actually an input of type email which just does all of that pattern matching for you. But the scary thing is that it's actually pretty involved to validate email addresses. I did a very simplified version of username at domain.tld. This is the regular expression that some browsers use to validate email addresses because even though mine is relatively simple [email protected], turns out there's a crazy amount of syntax that is valid in email addresses. And this is where regular expressions get scary. But for our purposes today, they're a thing that exists. You might find them useful in HTML. You might find them useful in Python. They're incredibly useful when it comes to extracting information from web pages. If you're analytically minded, you like the world of data science, you like to uh gather and analyze data, you can use regular expressions not just to validate data but to find patterns of data in actual websites or documents and extract that data so as to perform operations or analysis on them. So wonderfully useful if complicated tool. The catch though is this. Notice that here I'm still required to type in a valid email address register and I'm getting even more explicit information this time because I use the type equals email. The catch though with web pages is that they're not to be trusted in so far as this HTML came from the server and is downloaded onto the user's Mac or PC or phone where they have a copy thereof. I can open up developer tools as I did before by right-clicking or control-clicking and choosing inspect or whatever the menu option might be. I can go into the elements of this page, literally the HTML, and if I don't want to type in email, I want to just type in any old text and see if I can break your site, I can just change it. And now there is no such warning. Which is to say, even though you will encounter, not just today, but over the coming weeks as you play with HTML certain features, they are not to be trusted in general when it comes to security. And just like our discussion in the world of SQL and SQL injection attacks, this is one of the attack vectors. If two people are working on a website, one person's implementing the database stuff, one person's implementing the HTML, and the database person's like, "Oh, I don't need to worry about escaping characters because we're doing you we're using the pattern attribute in the HTML." Bad idea because it's this easy to hack a website, disable features that have been written for the site by just literally deleting them in your own copy. So, we'll see next week how we can defend against this on the server side, but the point now is just not to trust the user's input at all. All right. How can we be sure our HTML is right? Well, there's a bunch of ways, but one tool that's worth knowing about is this one here at validator.w3.org is a website uh by the group that essentially standardizes this and other languages. If I click on their validate by directput tab and I quickly go back into VS Code and let me grab the simplest of my examples, hello.html, I can just copy paste that into their website. Click check and they have written code to validate that the HTML I have written is in fact correct. Anything I've opened that needs to be closed has been closed. I don't have any stupid typos or missing brackets or quote marks. This is a wonderfully useful tool just to validate that your code is syntactically correct. Even though it might still look like a mess visually on the screen, this will at least check for you the underlying HTML. All right. So, up until now, everything I've done has been pretty boring. It's black and white. The pages are fairly simplistic. Turns out we can take things the final mile using another language altogether. Namely, something called CSS, which is the second of our three languages today. This two not a programming language, although curiously, they keep adding more and more features that are making it more and more like a programming language, but more on that another time. This stands for cascading stylesheets. And whereas HTML is all about the skeleton of a website, the structure thereof, CSS is like the the skin, the aesthetics thereof, the final mile that actually allows you to control the positioning of things more precisely, the colors, the font sizes, all of the aesthetics. It lets you do the finer touches on the website. And with CSS, we have slightly different syntax, but frankly, it just boils down to even more key value pairs. And as with HTML, we'll give you a taste of the basic structure and principles underlying CSS. There's so many uh key value pairs that are possible that we certainly won't do them justice today, but it's the kind of thing where you ultimately look it up in a reference, a book, um a website, or the like to pick up even more than these techniques. Well, let's do this. Let me propose that in a moment. We're going to see what are called properties. This is CSS's jargon for key value pairs. Why do we have yet another word? because a different group of humans in a different room came up with this language versus the other people. But it's just key value pairs known as now as properties instead of as attributes in HTML itself. There's going to be different ways we can define properties and this is kind of a laundry list of some of them and we'll see them in context. But in short, CSS is just going to allow us to slap a whole bunch of key value pairs on our HTML elements to make them hopefully look prettier or be more precisely controlled aesthetically. So, in my HTML, thus far, we've generally had something that looks like this. Turns out, if I want to start using some CSS, I can introduce, as we'll see, a so-called style tag in the head of my page. And inside of that style tag, I can put these so-called key value pairs. Or, as we'll soon see too, if I want to factor them out and put them into a separate file, I can actually use a link tag, which confusingly has nothing to do with hyperlinks or clickable text, but just links in another file. In this case, styles.css. the relationship of which shall be that of stylesheet. This the sort of copy paste stuff that you do where the only thing you really care about as the developer is the name of the file in which you're putting your styles. All right, let's do this. Let me go back over to VS Code, close out register.html, open up a new file this time called home.html, and let me purport to make a simple homepage for someone like John Harvard. I'll copy paste my boiler plate. I'll change the title here just to be uh let's say uh home. And then inside of the body of this page, let's do the simplest web page possible for someone called John Harvard. I'm going to say here's a paragraph of text uh when John Harvard is going to be the person's name. Here's another paragraph of text. Welcome to my homepage will be in the middle of this page. Then a final paragraph of text inside of which is like copyright. See how about uh John Harvard down here. So, it's a basic website. It's just three paragraphs. It's not going to be pretty, but let's make sure I haven't done anything wrong. Let me close my developer tools. Click back. Click home. And there we have it. The simplest of pages for John Harvard. Welcome to my homepage. Copyright John Harvard. Let's at least start to exercise some control over this. Let's change the font size and the alignment of the text. So, back in VS Code, let's go ahead and add uh for now, actually, not even a style tag, but a style attribute. I'm going to go ahead here and type in style quote equals quote unquote font-size large and then text-all colon center semicolon. And I apologize, but semicolons are back in CSS. Then, in my next paragraphs, open tag, let's do something similar, but different. font size colon medium for medium text align colon center semicolon. Uh, and then lastly down here, let's do style equals quote unquote. Font size colon small because it's the footer, so who cares? Text align colon center semicolon. Strictly speaking, at the last key value pair, otherwise known as a property, you don't need the semicolons, but just for consistency, I'll keep them uh for for that. All right, let's go back to this page, reload, and watch. All of the text a moment ago was left aligned and the same size. Now, it's a little subtle, but it's clearly centered, but it's large, medium, and small, respectively. Even if you've never seen CSS before, what rubs you wrong about this design, though, based on all weeks past? Yeah. >> Yeah. For every line, I've been repeating myself with text align center. Text align center. text in line center. And if we really want to nitpick, these aren't really paragraphs, right? There's like no phrases or full sentences, let alone paragraphs. So, it turns out there's a whole bunch of tags we can use to lay out a page. And in fact, I'm going to transition to one that's a little more generic than paragraphs, namely div, which is just going to create a division in the page for me. And this doesn't have any functional impact, but semantically it's a little nicer because it means I've got the division here for the header, the division here for the main part, and the division down here for the footer. It's just a different way of thinking about it. is just different rectangular swaths of the page. But I like your point that text align center is kind of stupidly duplicated all of these times. Let me actually go ahead and first reload this change because there is one side effect that we might want to get back. When I reload now using divs instead of paragraphs, well, there goes the nice white space in between my text. Divs just give me rectangle after rectangle. And as an aside, let me control-click or rightclick, open up developer tools yet again, and notice this other trick with your elements tab. Whatever you hover over at the bottom of your screen will be colorcoded at the top of the screen. So if I dive into the body by clicking this little triangle, let me zoom in. At bottom left, I can now see my own HTML much more uh pretty printed and colorful down here. If I click on this one or hover over it, you'll see that the first div, the rectangular region is highlighted. Now the second, now the third. That's all we mean by divisions of the page. Um, this allows me to see my copy of it in the browser as opposed to in the original file. So just another technique for developer tools. All right, but I don't like this duplication, but here is now the C in CSS. Cascading stylesheets means that if you want one property or key value pair to sort of cascade down on all of the other tags inside of that one, you can do that. For instance, in the body tag, I can add my own style attribute here and put all of that text align center there. Why? Because div are the three children of the body tag to borrow our vernacular from family trees and from trees more generally. So, this too should work because text align center should cascade down now on all three of those children. And indeed, if I reload the page, nothing visually changes, but it's arguably now better designed. All right, what more could we do here? Well, how about this? It would be nice to make clear to servers out there, like search engines, like what's going on in the page semantically. And the term of art out there nowadays is the semantic web, which essentially is about putting more hints in your HTML so that servers like um search engines kind of know more so what they're looking at. This is pretty generic right now. Div, div, div. But presumably the top of the page is among the most important things because that's effectively like the header of the page. Then the middle div is kind of the second most important because it's like the main part of the page and the footer is like the least important. So it turns out there are other tags in HTML besides paragraphs and divs. There are literally tags like header which allows me to define the header of the page, main which allows me to define the main part of the page and then even footer which allows me to define that too. So now if Google and Bing and other search engines are sort of crawling my website once it's public, they know that John Harvard's important because it's in the header, uh, welcome to my homepage is important because it's in the main page. They're probably not going to care as much about the copyright because it's in the footer. So it's just providing more hints to these kinds of services. Um, moreover, we can do some other things here. This is kind of a hackish way to implement a copyright symbol. HTML also has what are called entities where if I can do this magical incantation here, uh, amperand hash symbol 169 semicolon. Notice that VS code recognizes this as an HTML entity. If I go back to this page and notice my first approach was just parenthesis C parenthesis. If I reload now, having used that HTML entity, which I only know by having looked it up, now I get the copyright symbol that actually comes in the font that's being used here. All right, so let's transition now to this approach whereby I claimed before that you can actually use a style tag. And why might we want to do this? Well, looking back at my code here, this is sort of hinting at potentially bad design. Even though there are different arguments for and against this, right now I'm sort of co-mingling my data with my presentation thereof. Like John Harvard, welcome to my homepage and copyright such and such is sort of the data I care about. Um, but I'm sort of mixing in the stylization of all of this stuff by putting CSS and HTML in the same place. So to be clear, all of the green stuff and even well everything we've seen thus far, the tags and the attributes, that's all HTML syntax. Everything between the quotes is now CSS. And this is the first we've seen this before only in the sense that we've used SQL inside of Python code. Here we're using CSS inside of HTML code. But the CSS syntax is everything thus far inside of those quote marks. Wouldn't it be nice to kind of factor that out so that I can see it all in one place and better still factor it out ultimately to another file? And I can do this as follows. Let me in my home.html HTML get rid of all of these style attributes and really go whittle the page down to its essence whereby I just have the header main and footer tags inside of which is that content. It's already easier to read at least for me the human inside of my head tag. Now though let me go up and say style and inside of this new style tag let me show you another approach for stylizing the page. Up here is where we can actually select elements to operate on using what are called selectors. So if I want to modify the style of my page's body, I can do that by typing body. And then I'm afraid curly braces are back in CSS 2, I can put text align center up here. And the fact that I've put the word body before those curly braces just means all of these key value pairs, one in this case, will operate on the body. Meanwhile, down here, I can say the header is going to have font size colon large. Uh, the main part of the page is going to have font size colon medium. And then lastly, the footer of the page is going to have font size colon small. You know, definitely more lines now, which isn't the best, but the effect now if I go back to my browser and reload visually is pretty much the same. I've just relocated all of those key value pairs elsewhere, but as a stepping stone now for doing something a little smarter whereby I now can uh lay the foundation for putting this in another file al together. But first, let me note this too. The fact that I've put all of these key value pairs associated with specific HTML tags doesn't really make them very usable or re rather reusable. And so when I alluded to earlier that these properties can be applied to different selections of HTML type selectors, class selectors, ID selector, attribute selector. Let's just give you a little taste of this. What do we mean? Well, suppose that I want to generically be able to use text align center uh without associate it only with the body. Maybe I want to use this for a larger project where I want to uh center many things on the page. I can define my own keyword like the word centered which doesn't exist per se but if I prefix it with a dot what I've just created is what's called a CSS class and a class is just a set of key value pairs properties that you can associate with any HTML tags meanwhile if I want this key value pair to be associated with the notion of large I can define large I can define medium and I can define dot small down here the motivation for which is that now in my page page. If I want to center the body, oops, let me fix my own typo. If I want to center the body, I can say please use the class known as centered on this tag. And then on the header, I can say please use the class known as large on this tag. And then please use the class called medium here. And then lastly, use the class called small here. So now in the spirit of a lot of the modularization we did in Scratch and in CN Python of making your own functions, classes aren't functions, but they are a way to encapsulate one or more properties and use or reuse them anywhere you want in a web page. It's not that over it's not that impressive here in this short one, but it lays the foundation for doing much more interesting things soon down the road. In fact, let's take a step in that same direction. Let me go ahead and now highlight everything I've put inside of this style tag um and cut it onto my clipboard. I'm going to get rid of the style tag al together. I'm going to create quickly a new file comb.css and I'm just going to paste all of that stuff in there. And just to be nitpicky, I'm going to de-indent it so it's all left aligned. So all I've done is just move everything I just wrote into a new file called home.css. I'll close that. Out of sight, out of mind. But what I'm going to do now in the head instead of a style tag which contained all of that clutter, I'm going to say link href equals home.css and then this real tag which just means the relationship of this file to this one should be that of a stylesheet. And this tag 2 does not need to be closed. It just is. And now if I go back here and reload, still no changes other than the tweaked the font a moment ago. Still no changes. But now it's better design with that file completely separate. So where are we going with this? Well, just to kind of circle back to something we did earlier, let me open up my terminal window. And recall earlier we had this file like favorites0.html. And this contained all of the data from a couple of weeks back that we solicited via that Google form. And recall a bit ago when we went into favorites 0.html. I mean, it was just kind of an ugly uh table structure. But it turns out in the world of uh in the world of HTML and CSS, there are also what we're going to call frameworks, which is a fancy word for library. But a framework is sort of a way of doing something by using someone else's library. And to do it their way, you just read their documentation and then you adopt their functions in the case of code or you adopt their CSS classes in the case of this example. So, one of the most popular frameworks out there nowadays and among the simplest and best documented is one called Bootstrap. Uh, which is a set of uh CSS classes and other features that you can use because it's open source in your own code. And in fact, all of the documentation is at this URL here. I read the documentation before class and I copied really the one line of code that I need to make favorites.html even prettier. So, let me go back into VS Code and let me copy my pre-made example from earlier. And you'll see that in favorites, whoops, favorites one.html, I have all of the same code, all of those lines of everyone's submissions. But notice I've added now this link tag. And it's a little longer than the one I wrote. It's referencing a third party website, JS Deliver, which is a CDN, content delivery network, which is to say a server that just serves up content for other people to use. But I copied that from Bootstrap's own documentation. And what I did here is the following. I added a class to my table tag specifically with a value of table and followed by a space table striped. Why? Well, I read Bootstrap's documentation at that previous URL and I liked the look of their tables because it lays it out with nice stripes like white and gray and white and gray and it sort of formats everything quite a bit nicer. So, if I go into this version in my second tab by going back first and now opening up favorites 1.html, HTML, same exact data, two lines of change, and voila, now we're talking. This looks much more like a table that you would see on any pretty website like your Gmail inbox or the like, all by simply changing the CSS and not really the HTML at all. So, the motivation for introducing those classes a moment ago was so that we can have reusability of code. And better still, we can start to stand on the shoulders of others by using code that other people have written in order to improve the aesthetics of our own websites as well. All right, how about a couple of final flourishes with some style? Let me close out these examples here and let me propose to go into how about that same link example from earlier. So, let me reopen link.html, which recall had this fishing attack at the time. I'm going to revert this to the safe version and just say visit Harvard at Harvard's actual URL. Suppose I wanted to stylize this link beyond the default. Well, let's see what it looks like by default. If I go back into link.html, this is what it looked like before, blue and underlined by default per the browser's decision. But I can override that and any number of ways to keep things simple. I'm just going to stay in my same file now rather than uh be pedantic about moving it to another file. And if I want to stylize the anchor tag, just as before, I can say a and then in some curly braces here, I can do something like this. Color uh colon red. If I want to make it crimsonlike instead, let me go back to VS Code or my other tab. Click reload. And now we have a red tab. I can really geek out. And if you remember your hexadimal codes from our discussion of images a few weeks back, I can do hash FF000000, which is a lot of red, no green, no blue. And if I go back to my other tab, click reload, same exact thing. You have that much control over even the color codes that you might use. Maybe you don't like the underlining in this particular case. Well, that's fine. I can do something like text decoration none per the documentation. I can reload and gone is that underline. Maybe it'd be nice to hover over the word and then see the underline. Well, I can do that, too. Turns out I can have these pseudo selectors whereby I say the name of the tag, then a keyword like hover, which browsers know to recognize. And when I hover over an anchor, what I want to do is change the text decoration to underline temporarily. If I go back to this tab now, reload, looks the same, but as I move my cursor over, notice that it's underlining it for a visual effect. Let's see what's going on with my developer tools. If I right click anywhere and choose inspect, notice a detail I haven't showed us before is not uh is under the elements tab here. Notice if I go down to my link here and let me just make the right hand pane here a bit bigger. All this time but ignored up until now has been this part of developer tools whereby I can actually see all of the CSS that applies to the element I have just selected, namely this link. And I see here in nice pretty printed fashion that I'm using this color FF00000000 text decoration none. Why is this useful? Well, one, if you want to learn from another website how it's doing its thing, you can just look at the CSS, but also if you want to be able to iterate more quickly and just kind of tinker with things, I can actually turn the color on and off by just hovering over the inspector here and just turn it on and off by clicking and uncclicking. And if I want to just play around with, oh, maybe maybe Harvard should be 00 FF0000, enter, I can make it green instead. So, you can temporarily change the browser's copy of your own HTML or CSS just to tinker and iterate quickly just like I tinkered with Stanford's uh own website or at least my own copy thereof. Lastly, how about in terms of these selectors? These are using type selectors that is selecting the name of the tag. If I want to actually uh affect one tag specifically, a very common convention is to give an HTML element a unique ID. For instance, I'm going to call this Harvard. And by uh honor system, I should not give any other element in this page an ID of Harvard. The motivation is that I can now uniquely identify this tag by for instance changing this to hash Harvard, which is just the convention for specifying that it's not a class now. It's instead an ID. You do not put the hash though in the actual value down here. And what I can even do down here is something like um uh hash harbored to scope that as well. If I now reload, we're back to the red version and the same functionality as before. And it's just a more precise way now to target your CSS properties to a very specific element instead. Okay, [sighs] that was a lot. Any questions on any of this thus far? No. That clear? All right. Well, one last language for the day. And and we do mean what we say like that is the extent to which you will learn formally HTML and CSS like everything else just follows those exact same patterns. It's different classes. It's different attributes. It's different tag names. All of which can be picked up through practice, through uh osmosis, through uh references. But that's really it for the fundamentals. And so our last focus today is on an actual programming language that we'll just scratch the surface of, if only because it's so darn omnipresent nowadays. Most every website you use is made from not only HTML and CSS, but if it's in any way interactive, odds are it's using JavaScript, a programming language that is very commonly used client side whereby humans write the code on the server, but then your browser as before downloads it to the client and then it runs in your own Mac, your PC or your phone. That said, JavaScript is also very popular on the server nowadays. It's not just a browserbased language. In JavaScript, what you have most powerfully though is the ability in memory to mutate this tree in real time. In other words, think about even your Gmail inbox or your Outlook inbox. Typically, you see email after email after email after email. Odds are per today, what HTML tag is creating that UI of row after row after row? Which tag? like table tag like the table tag probably right table row table row table row but it wouldn't make well actually this is the way things used to work in my day back in the day when you visited not even Gmail before it existed but your email inbox you would download from the server a web page containing a table tag with table rows and table data elements and that was your inbox if you wanted to see if you got new mail you just reload the whole page and it would download new contents from the server and show you the new HTML with JavaScript which has come onto the scene over the past 20 plus years. You have the ability to download the data once initially, then use code to just grab some more data every 30 seconds or some more data pretty much anytime an email arrives. And if this picture here represents not our super simple hello title, hello body page, but a whole bunch of table rows for your existing email. The moment you get more email, you can use JavaScript code to add another node to this tree, another node to this tree representing the table row tag. the table row tag again and again. So in short, with JavaScript, you have the ability to change the tree, otherwise known as the document object model or DOM for short, dynamically in order to evolve the web page. So let's take a quick tour of what JavaScript does have syntactically and then I'll just demonstrate some of the capabilities thereof without dwelling today on syntax beyond this. So in Scratch, which is looking pretty good now, you had conditionals which looked like this. In JavaScript, it's pretty much the same as C. The curly braces are back at least for uh two or more lines. Uh but uh indentation doesn't matter except for the style thereof as it uh as in contrast with Python. If you have an if else, it's going to look the exact same in C. If you have if else if else, you have the exact same thing in C. Different from Python because this was l if in Python. Now we're back more verbosely to else if as in C. Uh variables in JavaScript. Well, here in Scratch is how you set a variable counter to zero. In JavaScript, there's a few ways to do this, but the most uh reasonable for now is to let counter equal zero. So, you don't specify the type. This is more of a polite way of asking the browser, please let a variable called counter exist and set it equal to zero by default. Semicolons are back. However, that's not strictly true. Browsers are smart enough to know where semicolons actually matter, but for our purposes, assume that they're always there. How do you change counter by one? Well, you can do it the pedantic way, which is a little verbose. You can do the plus equals trick or nicely back in play is the plus+ in JavaScript just like in C but not in Python. Loops in JavaScript. Well, in Scratch, if you want to do things three times, here's how you would do it in JavaScript. It's pretty much the same as C except for not mentioning the data type. Instead, you use the keyword let here. But otherwise, this is exactly the same as in C. Uh if you want to do something forever for whatever reason in JavaScript, you can say while true, which is exactly how we did it in C. If you have a web page like this, meanwhile, and you want to insert some JavaScript to it, you can do it in a couple a few different ways. You can put a script tag just like the style tag in the head of the web page. This can get you into trouble though for reasons you might encounter whereby if you put your JavaScript code up here and you try to use it to modify the web page but the web page isn't defined until down here you can get into some uh a race condition really where the data does not yet exist. So um you instead of putting it there or even in another file, it's actually pretty common too to avoid that altogether by putting your script code or your script tag at the end of the page just before the end of the body to ensure that all of the web page exists already. This is similar in spirit to the deaf issues we saw in Python or the prototype issues we saw in C. There's bunches of solutions though to this here problem. But let's now take some JavaScript for an actual spin and use VS Code to write some of it as follows. uh in VS Code. Let me go ahead and close link.html, open up my terminal temporarily, and let's improve my actually let's just improve the very file, hello.html, that I have here in front of me, and actually have it be more interactive and give me sort of a popup on the screen when I type in my name. So, let's start as follows. First, let's go ahead and change this just to uh hello, just for short. And in the body of this page, let's give myself a form. And in this form, let's give myself an input. Uh, we'll turn off autocomplete just to avoid distractions. We'll turn on autofocus to save me a click. I'm going to give this HTML element an ID uniquely of name. A placeholder also of name just so the human knows what to do. And the type of this field shall be text. In other words, I want to create a program week one and week zero where I type in my name and see hello such and such. I'm going to give myself an input a submit button with input type equals submit. don't really care what the button says, but I do care now when I go back to my other tab, close my developer tools, go back into hello.html, I now have something that looks like this. It looks similar to our search example for cats, but now I'm asking the user for their name along with the submit button. But what I want to have happen is when I type in David and click submit, I want to see hello David somewhere on the screen. Well, how can I do this? Well, a few different ways, but JavaScript allows me to do things like this. And for upcoming problem sets, you won't necessarily have to write JavaScript like this. So consider this a whirlwind tour, not so much uh something to ingrain. Here I can add a new attribute to the form tag called onsubmit, which as the name suggests means call the following function when this form is submitted. Well, what function do I want to call? I'm going to call it a greet function. And that's it for now. How do I define a greet function? Well, I could, among other places, put this inside of the head of my page in a script tag. I can define a function in JavaScript by literally saying function and then the name of the function and then in parenthesis any arguments there too. I'm not going to have any. And then in curly braces, I can actually define the meat of that function. And for instance, I can do this. Uh, let name equal the following document.query selector. And now what I want to do is this. Document is a global variable that just comes with JavaScript in the browser that allows me to write code involving the whole document, the web page itself. Query selector is a fancy name for a function that lets me select specific elements of the page using CSS selector. So the very same syntax we saw with names and with dots and with hash symbols a moment ago are back in play for JavaScript here. So if I want to create a variable that stores the name that the human typed in, what I can do is pass to query selector a selector for that element, which is quote unquote hash name, where hash just means ID. But the reason I'm using name is because the unique identifier I put here is name. If I change this to foo nonsensically, that's fine. I just have to change this to foo up here. So I'm in full control over what is called what. But if I want to get the value that the user typed into that box, I now do value. And we've seen these dots before. In C, they were for accessing strrus. In Python, they were for accessing contents of objects. So this just means use the document global variable, use the query selector function or method inside of it, get the element whose unique ID is name, and then go inside of that text box and give me its value. So it's a very long-winded way of saying store the user's input in a variable called name. But what's nice now, even though this is going to be a bit ugly, is I can then use a built-in JavaScript function called alert. And I can say something like hello, close quote, then plus, which we've seen before in Python, and concatenate with it that name's V value. Now, this isn't quite complete and for reasons I'm going to wave my hand at. I also need to add annoyingly return false down here because otherwise if I click submit, yes, the greet function will get called, but the browser will still try to submit the form to a server which is going to interrupt my own code. So, long story short, this is a bit of a hackish approach for now to just making sure that the only thing that happens when I submit this form is that my function is called. Now, if I didn't screw anything up, I should now see after reloading this page a prompt for my name. I'll type it in and when I click submit, I should see an ugly but functional alert box pop up with dynamically generated text, namely hello, David. I say it's ugly because by convention, Chrome shows you the full URL or the domain name of the website in question, which is my randomly generated one, which does look stupid. So, we can do better than this. But the point is now that I have written code in JavaScript to listen for the submission of this form and when that happens call that their function. And this is generally the paradigm of JavaScript. There exists in the context of websites a whole bunch of events that can happen. And this is a word we haven't used since week zero in Scratch. Recall that in Scratch you have events like when green flag clicked and when the green flag is clicked you can do something in response. Same thing in the world of web programming. Here are just some of the events that can happen in a web page. Like the user can change something, click on something, drag something, key up, put the keyboard up, put the mouse down, or other things. What I'm listening for is the submission of a form, which is cool because in JavaScript then you can essentially write code that listens for any number of these events and then does something when it happens. Consider after all in Gmail, if you click the little refresh icon within Gmail itself to get new mail, it runs some JavaScript code. it turns out to talk to Google's servers, get more email, and update your site. If you click and drag on Google Maps to see like higher up geographically, well, what's happening? Some JavaScript code is listening for your mouse going down and dragging so as to go fetch more tiles, more rectangular pictures of the map wherever you're trying to drag. So, anything that's interactive in websites nowadays like that is using JavaScript by just listening for things that you or someone else might actually do. Well, let me go ahead and start opening some pre-made examples just to give you a sense of the other syntax that is in use today with JavaScript. I'm going to go ahead and open up a version of this hello program called hello2.html, which is different in that I'm practicing what I preached earlier by putting the script tag at the bottom of the page just to ensure that the form and everything inside of it already exists for sure by the time this code executes. Moreover, what I'm getting out of is the business of using the onsubmit attribute. So, just as I tried to get my CSS out of my HTML and put it elsewhere, similarly, I'm trying to get my JavaScript code like the greet function out of the HTML and putting it down here. Now, why is this useful? This is a big mouthful, but it just follows a general pattern as follows. Document.query selector quote unquote form is just getting a reference to the actual form element in the page. So if you imagine in your mind's eye that this is drawn out as a tree in the computer's memory, this is just getting me a pointer to the form node in that tree. Haven't seen this before, but it kind of does what it says. Add event listener is a function or method that you can call on any element that just tells it to listen subsequently for this event and when that event is heard, submit in this case, call the following anonymous function, otherwise known as a lambda function. But long story short, this syntax just means when submit happens on that element, execute the code between these curly braces. What happens? Alert. Hello. Quote unquote document.query selector name.val. I didn't bother with the variable this time. This does exactly the same thing, but is a purely JavaScript solution without using the onsubmit attribute. And we show you this only because especially for final projects, you might want to do something like add event listener to make like maybe a drop- down menu or some interactive clickable thing in your website that just listens for one of these events to happen before actually executing some code. Um, notice I've been conventionally using single quotes in JavaScript because that's just a thing in the JavaScript community to generally prefer single quotes over double quotes. Why? Well, it means people in JavaScript are hitting the shift symbol like much less than the rest of the world to get double quotes. It's just a convention. So long as you're consistent, um either is fine. Um conditional on not having actual apostrophes and text and such. Let me show you one other convention. Instead of putting my code at the bottom of my page just before the body ends, it is also alternatively conventional as in hello3.html HTML to do this to still maybe put the script tag at the top of the page, but to additionally have this magical line whereby you add an event listener before you do anything else that listens for this crazy weirdly named event called DOM content loaded. But now that you've heard DOM briefly, DOM is document object model just means the tree in memory. This is just the fancy way of saying when that tree is loaded, go ahead and do the following. And this ensures that when a browser reads all of this code top to bottom, left to right, this code won't actually be executed until the whole DOM is loaded into the computer's memory. That whole tree is built. So that's all that's being referred to there. The rest of the code is actually exactly the same. Um, what more can we do? Well, just so you've seen it, I can delete all of that code, move it to a file called like hello.js, JS. And in the fourth version of this example, I'm back to just HTML because I can put all of the fancy complexity inside of my script uh tag here, factoring that code out into hello 4.js, but the code is otherwise, I claim, unchanged. All right, this is a lot. I know it's quick, but do the general principles make sense? Like just listening for events and running some code in response? That's really all we're talking about. Allah week zero with scratch. All right. Well, let let me let things escalate just a little bit. And this time I'll open the demo first. Let me go ahead and open up hello 5.html which I wrote in advance, which okay, this is definitely starting to look like a mouthful, but in a moment it'll make a bit more sense. Let me go ahead into my other tab here. Click back. Go into source 8, which is all of my pre-made examples. And I said we're in hello 5 now. And in hello 5, there's no submit button because watch this fanciness when I search for something like C uh or David as a full-fledged word there. Notice it's just happening inside of the web page. Moreover, if you poke around, let me rightclick on the page. Let me inspect to open my developer tools. Let me expand the body down here. And actually, let me reload the page. So, notice by default, this is what my web page looks like. It's just got an empty paragraph tag for some reason. But watch what happens at the bottom of the screen. And I'll zoom in a bit more. When I start typing my name like D and then let me expand this triangle. You see it beginning A V I D. When I say that JavaScript can mutate the DOM, the actual tree in memory. Like that's what you're seeing. You're seeing the HTML preprinted color-coded version of that tree in memory. And how is it working? Well, if we go back to the code here, well, let me wave my hands at this first line. This just means don't do this until the whole DOM is loaded. Let's look at this line, which means give me a variable called input, and set that equal to, okay, the input tag on the page, the text box, and then do what? Well, take that input, add an event listener that's forever listening for key up, like my finger going up off the keyboard, and when that happens, call the following function, which has no name, but that just means call these lines inside of the curly braces. Well, what happens inside of those curly braces? Well, here's a variable called name. And this is just pointing at the paragraph tag. Apparently, I'm checking this question. If there's hm if input value, so this is like saying if input value does not equal quote unquote just implicitly, go ahead and set the inner HTML of that name variable equal to hello quote unquote input value. Now, this is crazy syntax and I'm showing it just because you'll see it in documentation online. This is similar in spirit to Python's F strings. It's ugly syntax with dollar signs and curly braces and worse yet back ticks. However, this is a manifestation of really the JavaScript community presumably deciding that if you want the language to evolve, you have to make sure you're backwards compatible with old versions of the language. So, they chose characters and syntax that probably do not appear already in the wild. That's why sometimes things look uglier, I would surmise, than otherwise. But long story short, this just means if there's input there, go ahead and say hello, input. Otherwise, it says by default, hello whoever you are. And in fact, if I go back here and delete my name, watch what happens. It goes back to that default. So, here is just an example of listening for keystrokes going up and down and making sure that the page responds accordingly. How about something else? Let me go back into my directory listing. Let me open up background.html, which I wrote in advance. It's super simple, but this is the first of like an interactive website that has three buttons labeled R, G, and B. As you might imagine, clicking on R does that. G does this, B does that. Well, how is this working? This is the first example now where you can use JavaScript code to alter CSS dynamically. So, let me reload the page. So, it's back to white. Let me open developer tools and watch what's happening now on the body tag specifically. Initially, there's no stylization on the body other than the browser's default margins and whatnot over here. But watch what happens at bottom right when I click on the R button. You see that all of a sudden background color red was dynamically added. Now it's green, now it's blue. And notice the HTML at bottom left is changing too. So somehow I am listening for clicks and then changing CSS in response. So if I go back to VS Code, let's close Hello 5. Let's open up source 8's uh version of background.html. And in here, it's a bit of a mouthful, but the HTML is simple. Here's three buttons. And because I wanted them to be uniquely identifiable, I gave them all IDs of red, green, and blue, respectively. And then this code is a bit of copy paste. And frankly, I could probably avoid that if I were more elegant. But just to be pedantic, here's what's happening. Here's a variable called body that's just getting the body element, the node in the tree at that moment in time. And then these three lines of code, their purpose in life is to handle the red clicks. How? Well, we're telling the document to select the element whose ID is read, listen for the click event, and whenever that happens, do this. Body, which is the same variable as before, dotstyle, which we haven't seen before, but any element can have a style property associated with it in JavaScript. Background color equals quote unquote red. And the other blocks of code are exact same thing for green and for blue. The whole point here is we're now listening for clicks on buttons and changing not the contents of the button but rather the style thereof of the whole page. As an aside, this is curios uh curiosity. This is what's known as camelc case whereby like a camel has a hump in the middle. This word has a hump in the middle like capital C all of a sudden to separate the two words in CSS. Recall it was uh a moment ago background dash color. Anyone want to guess why this is not how you write it in JavaScript? Anything with hyphens in CSS is changed to camelc case in JavaScript. >> Uh it's not related to comments. It's simpler than that. Yeah. >> Yeah. Right. Like left hand wasn't talking to right hand and people realize, oh damn it. Like this now means background minus color which is not a thing because minus is indeed just like in C and in Python a mathematical operator. So, the world decided to reconcile this problem by just capitalizing uh the character that would otherwise be where the hyphen is. Well, little CSS trivia. All right, what else can we do? How about a couple of final examples here? So, what more can we do with CSS? So, back in my day, too, we had a tag called the HTML blink tag, which is among the few tags in the world of HTML that has actually been deprecated, that is removed from language. Like no one removes things from languages generally, but the blink tag was so hideous, followed only by the marquee tag whereby my own homepage is like a freshman had like welcome to my homepage just moving across the screen like this from left to right for no good reason like an ugly marquee and like uh on like a digital signage nowadays. But we can bring it back as follows. So if I close out my developer tools, go back into my source 8 directory and open up blink. This is what the blink tag used to do back in the day. Now, this version is implemented instead in JavaScript code as follows. I have a function here called blink, which I'm apparently calling every once in a while. Uh, how is that happening? Well, let's scroll down. Here's my HTML, super simple. Literally just says hello world. But notice this. There's another global variable we haven't seen in JavaScript called window. That refers to like the general window, not necessarily the contents of the page, where you can call a method called set interval. And you can tell that method set interval to call a specific function every number of milliseconds. So if I want to call blink every 500 milliseconds, that's the line of code that I use. If I scroll up to now this function, let's see how blink is implemented both now and perhaps back in the day. Well, body is a variable here that's just pointing to the body node in the DOM. And this is a big mouthful, but if that body's styles visibility property in CSS is quote unquote hidden, then change that body's styles visibility property to be visible. Otherwise, change it to be hidden instead. Here too, don't understand why left hand and right hand weren't talking to one another. You would think that the opposite of visible would be invisible, but in CSS, the opposite of visible is hidden. Just have to memorize stupid things like that. But what's this really doing? It's just changing the CSS from hidden to visible. Hidden to visible every 500 milliseconds. So in fact what you're seeing here in the blink is if I inspect this page too. And now notice it's kind of fun just to watch it. You can see the HTML at bottom left and the CSS at bottom right just automatically changing because I'm doing that every 500 milliseconds. All right. How about one other? Well, autocomplete. Well, we saw a step toward this with my hello, David example a moment ago. Super common though in Google and like every website now to automatically try to finish your thought. How is that happening? Well, that's not just HTML and CSS. That is also some JavaScript thrown into the mix. So, for instance, let me go into my terminal and open up source 8's example called autocomplete.html. And here I am going to borrow a file called large.js which is just a massive version. I'll open that too if you're curious. Large.js is just a huge JavaScript array. eras are back containing all of the words from problem set five, the spellchecking problem set where you had a 100,000 plus words in uh C in a file given to you. Now we've converted that to JavaScript by using a global variable like this in the code here. What's happening? Well, apparently there's going to be a text box at the bot at the top of the page that we see. Then there's an empty unordered list. So an empty bulleted list. And then there's this code down here. I'm apparently creating a variable called input that's referencing that text box. I'm then listening for key up just as like we've done before. And then I'm doing this. I'm setting a variable called HTML equal to quote unquote nothing. So an empty string. And then I'm checking does the input text box have any value implicitly. If so, what am I doing? This is kind of cool. It's a bit of Python and C together syntactically for each word in the words array. JavaScript uses the keyword of instead of in like Python, but so be it. What I'm doing now is in JavaScript, I'm saying if that current word in that big file of 100,000 words starts with whatever the user typed in, go ahead and add to that HTML string using plus equals, which is just concatenation. We've seen plus before, the following, an LI tag inside of which is that specific word. And so in effect, what you're seeing now is what every almost every website nowadays does. They're not manually writing HTML like we've been doing much of today. They're writing code that dynamically generates HTML because the programmers understand what HTML is. They understand that unordered lists have li children. And so using this string that I've highlighted, they're creating LI element after LI element for the purpose of changing the inner HTML of the UL element to be the value of that variable. And this is a very long way of saying how is autocomplete implemented in general. Well, just like this, if I search for cats by typing in C, there's every word in that 100,000 dictionary that starts with C. A T S. And there's every word that starts with C A T S. Meanwhile, watch what happens underneath the hood. If I open up my inspect tab again and I go to my body, inside of this is the empty UL, but watch as soon as I start typing something like C. Now I can expand the triangle because there is an LI element that's been created for every one of the words that match. As I do ATS, now I've got just four of them. And there is cats, there is cats skill and so forth. So anytime you go to google.com like we did earlier and we went to google.com and started searching for cats, where are all of those search results coming from? Someone wrote JavaScript that's listening for key up or the like and then dynamically populating an unordered list or in this case a much prettier list of the matching results. And the final example that we thought we'd leave you with, and again the whole purpose of introducing JavaScript is to give you a taste of its syntax and its relative familiarity, but with the power that you can uh the power with which you can leverage it to make websites so much more interactive. And in fact, with Bootstrap, you don't just get CSS you can use, you have a whole set of JavaScript functionality. So you can have drop- down menus and the like. For instance, for instance, among the things you'll use for an upcoming problem set and perhaps your final project, something that looks a little like this, uh, in Bootstrap.html, here's a whole bunch of code that I literally copied and pasted from Bootstrap's documentation. And it's just like boilerplate code for a corporate website that has features with pricing and disabled menu options as well, just for the sake of discussion. And then here, if I go back into this example, you'll see fairly simple website that looks like this. A so-called navbar with all of the main menu options of like a corporate website. And notice if you start to resize the window, which I'll do here, and put it into sort of mobile mode because it's so narrow now, thanks to JavaScript, it's listening for clicks on this hamburger menu and revealing the menu options that way. This is quite like how CS50's own website works and so many other websites out there. But the last one we thought we'd use is you're so in the habit of using Google Maps or Uber Eats or any number of apps that need to know your location. That too is exposed through JavaScript quite simply. Let me go ahead and in geoloccation.html HTML open up uh the following code whereby super simple even though some new functions there exists another global variable in JavaScript in browsers called navigator which has a property called an object called geoloccation which has a function called get current position that takes an argument which is just an anonymous function which means call this code when you're ready to know the uh coordinates because it might take a while to figure out your GPS coordinates and once you do this simple example is just going to write to the document that is the rectangular page the positions latitude that comes back and the position's longitude that comes back. So to see this in action, let me go ahead and uh open up that second tab. Go back into geol location. It's notice for privacy sake, it's asking me to approve this. So I'm going to say allow this time. There are apparently my laptop's GPS coordinates. And if I go to google maps.com, I can actually paste this in here. Enter. And looks like if we zoom in in in okay, I'm not technically outside, so it's only close to a degree of precision, but it's probably mapping to one of the Wi-Fi access points that's on that corner of the building. So, we're pretty darn close, pretty much close enough to get me my my food or my my ride here. And a final note, now that you've seen a little bit of JavaScript, let me go ahead and open up just 60 final seconds of uh just how uh how much effort it took us to put not only this lecture together, but particularly that example of the teaching fellows passing packets, everything we like to think is very finely flourished here. Uh but here's a little bit of behind the scenes and these final 60 seconds together. If we could dim the lights before we adjourn. >> Off you go. Offering. Okay, Josh. Nice. Helen. Oh, Bentimony. No. Oh, wait. [music] That was amazing. Josh um Sophie [laughter] Amazing. That was perfect. [music] [laughter] >> I think I over to you all. >> Oh, nice guy. [music] That was amazing. Thank you all. >> So good. >> All right, that's it for CS50. We'll see you next time. Heat up >> [applause] >> here. Heat. Heat. [music] [music] >> [music] [music] [music] [music] >> All right, this is CS50. This is already week nine. And I dare say this week is the most representative of what you'll be doing after the class if you so choose to program in the future and tackle some project that's new to you. In fact, the closest to this week was perhaps week six wherein we didn't really introduce all that many new concepts but really translated them from C and to Python. And so this week in particular, the goal is to really synthesize the past 10 weeks of class, drawing upon a lot of the building blocks that are hopefully now uh metaphorically in your toolbox and gives you an opportunity now to apply those ideas to new problems. In particular, web programming. So every day you and I are using the web in some form. Every day you and I are using mobile apps in some form. And we said last week that the languages underlying a lot of those applications are HTML and JavaScript for the layout and aesthetics. and then also in part JavaScript for a lot of the client side interactivity that you might experience nowadays. Well, today we come full circle and bring back a serverside component whereby we'll again write some Python, we'll again write some SQL code and use it to make our full-fledged own web applications and in turn if you so choose mobile applications as for your final project as well. So up until now when we did anything with the web, you ran this command last week HTTP server which literally did just that. It spawned a so-called HTTP server that is a web server whose purpose in life is just to serve up content from like your current folder, any files therein, any folders therein. And so all of the URLs generally followed a certain format. So if your URL were example.com/reall just denotes the root of the web server and so in there typically by default you would see a directory index. We'll see today that that goes away because generally when you visit something.com/ you want to see the actual website, not the contents of everything in the server. So we'll see how to address that. But the URLs up until now have been of a form like file.html literally referencing a file in that folder or folder slash which just means whatever is inside of that folder or folder/file.html or dot dot dot. You can nest these things however long that you want. And recall that more generally we said that you're referring to some kind of path on the server where pi the p path is a step of folders ending in perhaps a file name. So today we're going to generalize that at least in terms of nomenclature and start talking more about routes because essentially in web programming we are going to exercise a lot more control over what is in the URL. So back in the day it referred to literally a file on the server and as recently as last week the URLs referred to literally a file on the server. However, we'll see in code that we can actually just parse this that is analyze what is after the domain name in a URL and just use this as generic input to the server to figure out what kind of output to produce. We're going to see the same convention though. If you want to pass in specific parameters, key value pairs, uh we'll use a question mark after our so-called route key equals value. And then if there's another one or more, we'll just separate them by amperands. And to do all of this, we're going to recall the inside of those virtual envelopes. Recall that if we did something like on google.com to search for cats, what was really being sent to the server was a request for /arch, which notice is not search.html. There's no folder per se there. This is just the name of a program really running on Google servers. And that's going to be the so-called route that we ourselves start programming today. question mark Q equals cats just meant that the query parameter the input from the web form is going to contain in this particular example the word cats. So how are we going to do all do this? So we could implement our own web server in C. It would be a nightmare to like use a language as lowle as C and actually deal with something as high level as writing code for the web. We're instead going to use Python for the most part if only because it's much higher level. But even then, we would probably if we wanted to do this thing uh from scratch, we would have to write a lot of Python code to like analyze the insides of these envelopes, figure out what inputs are being passed to the server, and then figure out how to access that in Python code. It's just a lot of work to just get a web application up and working. And so what the world generally does is they don't reinvent the wheel of writing their own web server. Rather, they use an off-the-shelf fairly generic web server or application server as it might be called. And we for instance are going to use something called flask. Now flask is a framework as the world would say or more specifically a micro framework which just means it's a library of code that other people wrote to make it easier for us to implement web applications. So they took the time to figure out how to handle get requests on a server, post requests on a server, figure out how to extract key value pairs from URLs, the sort of commodity stuff that like literally every web application on the internet has to do anyway. So we don't have to retrace those steps ourselves. What this will allow us to do is only implement the problems that we care about by using this framework. And to be clear, a framework much like Bootstrap is not only a library that someone else has written for you, but it's like a set of conventions that you follow in order to use the library in their recommended way. So it's more of a generic term that includes library and a set of conventions. And how do you know how to use either? You just read the documentation or take a class in which we're about to give you an introduction to some of this right here. So instead of running today http-server to start a web server that just serves up static content files and folders in our account we're instead going to run the command moving forward flask space run and this is going to look for code that we've written in our current directory and if it is in accordance with the conventions to which I'm alluding by using the so-called framework then it's going to start our web application on some TCP port for instance 8080 as we discussed last week to do this all we have to have in our current folder There is minimally a file called app.py by default. This is hinting at an application in the language called Python. And what code we put in there we'll soon see. And then ideally we would have another text file called requirements.ext by convention inside of which is just one per line the name of all of the libraries that we want this web application to include. In other words, if I go over here to VS Code, if I don't have such a file, that's fine, but I want to use a framework like Flask. Recall our pip command for installing Python packages. is I could just say pip install flask enter and that would go ahead and install the flask framework or library for me just like we did a few weeks ago with installing the silly little cows uh library as well. I've already done that in advance and better still I've installed I've come with uh my code today both of these files app and requirements.ext and in fact if I go ahead and create one just for fun here all you need do in a requirements.ext text file is literally put the name of the library that you want to include and then you run pip in a slightly different way to install that library or any other libraries that are in that file as well. So let me wave my hands at the requirements.ext for uh moving forward. It just means what libraries do you want to use with this web application so you don't have to remember or memorize them and type them all out manually. All right. So what's going to go inside of app.py? Well, the minimal amount of code that we can write to make our own web application that does something like print out hello world to my browser could look like this. Now, there's a bit of new syntax here, but not all that much today moving forward. The very first line just says from flask import flask, which is a weird way of just saying give me access to the flask library. Capitalization no matters. And so, the package that we're using is called flask lowercase, but we want to have access to a special function in there called flask capital F. So this is sort of a copy paste line. The next one's a little weird looking, but it essentially says give me a variable called app and turn this file into a flask application. We haven't seen this in a few weeks, but there was that weird if conditional that we put at the bottom of some of our Python code a few weeks back that just said if uh dot dot dot and it mentioned in there name if name equals equals_. So we've seen an illusion to name. For our purposes, name just refers to whatever the name of this file here is. No matter what I call it, you can sort of access the current file by way of this special global variable. So this line collectively just means turn this file into a flask application and store the result in a variable called app. So I can now do stuff with flask. And what am I going to do? Well, down here, let me first point out a familiar syntax. I'm defining a function that I called index by convention, but I could have called it anything I want whose sole purpose in life is just to return quote unquote hello world, which is the super simple output this web app is going to display. But, and this is the new syntax, I'm using here, what's generally called a Python decorator, which is a type of function that essentially affects the behavior of the function right after it. So, by saying atapp.rout route quote unquote slash. This is telling the Flask framework associate this index function with this route, the single forward slash. And that's how we're going to take over the default behavior of the slash portion of the URL by telling it to return whatever this function returns. And we'll see this in action now. So let me go over here say to VS Code. And within VS Code, I'm going to whip up exactly that application in a file called uh app.py. Just so as to combine this and some subsequent examples, maybe the same folder, I'm going to first create a directory or folder called hello. I'm going to go into that hello folder. I'm going to go ahead and recreate that same requirements file just for good measure to tell the world that I want to use the flask library here. And then I'm additionally going to create now app.py. And I'll type this fairly quickly, but I'm just reciting what we saw a moment ago. From the Flask package, import the Flask function, lowercase F, capital F, respectively. Then give me a variable called app. Set it equal to that function call passing in the name of this file, whatever it actually is. And then lastly, let's go ahead and call at app.rout quote unquote slash, which says, hey, Python, whatever the next function is, associate it with this slash route. And so I'm going to define that function. I could call it anything I want, foo or bar or baz. But in so far as slash represents the index of the website, like the default page, I'm just going to go ahead and call it by convention index and then return for now hello, world. And that's it. So whereas last week when I was writing code in HTML files, I was making web pages, now I've created what we'll call a web application. And it's an application in the sense that there's actually some logic going on there. There's some functions, there could be some conditionals, there's clearly a variable, there could be loops, and all of the sort of stuff we've seen in Scratch, NC, and Python as well. We'll now see back in this Python file. So, how do we now run this? Well, let me go back into my terminal window here, and I'll clear it just for good measure. I'm going to go ahead and run flask run enter. I'm going to see some cryptic looking output, but there's that familiar pop-up with the green button that wants to open up this application, whereas HTTP server uses 8080 by default. Flask uses port 5000 by default. And here we have it. I've just opened up my second tab, and we spent a lot of time there last week. This is the server I'm running, not on port 8080, but on port 5000 today. And there is the contents of what was spit out by my very first application. Now, even though the browser is rendering this like it is a web page, notice this. If I uh inspect, if I rightclick or control-click anywhere on the screen and go to view page source, you'll see that there's no actual HTML on this page. It's literally a single line of text, hello, world. If I close that and rightclick or control-click again and go to inspect like we did last week to open up developer tools, you'll see that the browser has actually filled in some blanks here for me by just rendering as it should the minimal possible web page. But the content I actually sent to the web browser is only literally hello, world. So how can I actually send a web page of my own rather than letting the browser do something like this? Well, I could go ahead and close that and go back to my application. I'm going to go ahead now and hide the terminal just because the server is still running. And what I'm going to go ahead and do here is well, nothing's really stopping me from returning not just a string of text, but a string of HTML. And this might not look pretty, but let me go ahead and do open bracket doc type HTML close bracket then HTML then head then title. And I'll just title this for instance hello to keep it simple. back slashtitle back slash head open bracket body hello, world back slashbody back sltl uh close quotes and I used single quotes in this case but I could have just as easily used double quotes but that's a full-fledged web page like that's the minimal amount of content we saw last week actually you know what for good measure let's actually add lang equals quote unquote en so it's actually fortuitous that you use single quotes because now I have some double quotes inside and even though this is not pretty printed it's just one massive mouthful of HTML all along one Fine. When I now go back to the browser, reload the page as by clicking here, and then view page source again, here's what my browser received this time. Indeed, it's the full-fledged HTML. And in fact, if I close that tab and reopen developer tools via inspect, now we'll see in the tab absolutely everything that I sent over, including a title, including the lang equals n. And had I typed even more, we would have seen that, too. All right. So, what was the point of this exercise? It feels as though that I've really just taken more time, added more complexity to achieve literally what I could have done last week by just creating index.html myself without any Python code. But I dare say what we're trying to do is lay the foundation for a full-fledged interactive website that maybe has forms that we can submit to the application that allows us to generate not just one page, but maybe two or three or any number. So what you're seeing here is sort of the beginning of google.com's search application or gmail.com itself or facebook.com or any web application you can think of begins with a little code that theoretically looks a little something like this. But this is kind of stupid to put HTML hardcoded no less in one long string here inside of my application. Let's try to factor this out. That was a lesson we preached last week about sort of factoring out our JavaScript, factoring out our CSS. We can do the same thing with our actual HTML here. And so what I'm actually going to do is import not only the Flask function, but also another function that per its documentation comes with Flask called render template with an underscore in between. This is a function whose purpose in life is to render a template, so to speak, of HTML. We'll see what we mean by template in just a bit. But down here, what I'm going to do is now delete all of that code. And let me just assume that I'm going to put that same code in a file called index.html, html just like I did last week. So let's instead return the return value of render template of quote unquote index.html. Now that file does not yet exist. Indeed, if I go into my terminal window, create a second terminal just so I can leave the server running but still see what's going on. I'm going to CD into that same hello directory, type ls to list my files, and I only see app.pay and requirements.ext. But it turns out per Flask's documentation, if you want to create your own HTML files, you simply have to add a directory that by convention is called templates. And that's it. So in addition to app.py requirements.ext, I need a folder called templates. So let's go back into VS Code, make dur templates. Capitalization matters, all lowercase. Now, let me go ahead and cd into templates and run the code command and create a file called index.html in the templates folder. And then super quickly, let me hide this. Let me whip up that same page again. Doc type HTML html lang equals quote unquote en close bracket uh head close bracket title close bracket hello and then down here body close bracket hello, world. So autocomplete is helping me type quickly. But now I have a file with my HTML that this application I claim is going to spit out automatically for me. So let's see the effect. Let me go back into my other browser tab. Let me close the developer tools and let me quite simply just click reload. And no apparent change. It's working exactly as it did before, but I've laid the foundation for making a much more useful layout of my files so that I can actually keep my logic, my Python code, and my HTML a bit separate from that. All right. Well, how can we make this into something even more interesting? Well, let's start to take some actual user input for instance. So, wouldn't it be nice if I could pass in via the URL something like Q equals cats, but maybe something like name equals David or name equals Kelly and actually see the name that's being outputed. In other words, let me zoom in up here and let me pretend like this happened automatically. Let me do question mark uh name equals David. Enter. Well, it would be nice if I saw hello, David. I'll I'll propose rather than just hello, world. So, how do I actually get access to everything after the question mark? Well, here is where a framework like Flask and any number of alternatives starts to shine. It gives me that answer for uh automatically. And so it turns out in Flask once you've used it, you have access to a special global variable as we'll call it called request.orgs where args just means the arguments or the parameters that were passed in to this HTTP request. So how do we use this? Well, let me go back to VS Code here. And at the very top line, in addition to importing Flask, capital F, render template, let's also import request, which is a global variable that comes with the Flask framework. And then I'm going to use it as follows. I'm going to go ahead and say um a second argument to the render template function where I'm going to say placeholder equals request. Actually, let me not do that yet. Let me first create a variable name equals request args. And then let me go ahead and get the name key from the arguments. And then down here, let's go ahead and pass in placeholder equals name. So what am I doing here on line 8? I'm creating a variable called name. I'm storing in that the value that's in the request global variable in what's apparently a dictionary called args, specifically the name key therein. So if the thing after the question mark name equals is David, this should give me David. If it's Kelly, it should give me Kelly instead. Then what I'm doing is rendering this template called index.html, but I'm additionally passing in some named parameters. We talked briefly about that in week six when we introduced the idea that Python can take not only a commaepparated list of arguments, but some of which can have names. So I'm proposing that one such name of an argument to this render template function can be placeholder for instance. Now, at the moment, this code isn't going to do anything useful. If I go back indeed to the other tab, click reload after zooming in, even with my name in the URL, you'll see that we still see hello, David. But here's where things now get interesting. And here too is what we mean by template. If I go back into VS Code, open up index.html again, and instead of putting the word world there, what I'd like to see is not hello world, but hello, placeholder. But of course, if I literally type that, I'm going to see literally placeholder unless I surround placeholder with pairs of curly braces like this. And by using these pairs of curly braces, I'm telling Flask that I want to interpolate, so to speak, that variable. I want to substitute in its value. So this is yet another syntax. In Python, we saw fstrings. In C, we saw percent s. When using something like print f in an HTML file, when using flask specifically, we use these pair of curly braces to denote this is indeed a placeholder whose value should be plugged in. So now let's go back over to the second tab. Recall if I zoom in that passed in already to this URL is question mark name equals David. And this time when I click reload, voila, now I see my actual name. And unlike the JavaScript examples last week which were doing everything client side, notice here if I go to uh rightclick or control-click and view page source, what's noteworthy today is that David in this case literally came from the server. This was not rendered client side. The server sent this HTML and specifically this text. So, if I go back to the same tab here, zoom in and change David for instance to Kelly, what I should see instead when I hit enter is hello, Kelly. And indeed, if I go back to the source code and reload the page there, I should see in the view page source that the server sent indeed hello, Kelly. So, it's in this sense that it's an application. The URL is providing input to the application by way of this URL format, the so-called get for uh the get string that's being passed in. And if I look at the code that I'm running, app.py is the code that's running. It is grabbing that name from the URL. I am then passing it into my index.html file and then my HTML file is plugging the actual value in for me. And so what's going on with for instance these curly braces? Well, here too is where we're actually using a library. And included in Flask is another library called Ginga. And Ginga is what's called a templating library. And there's so many templating libraries in the world. Ginga is actually fairly s simple, which is nice. And which is why Flask uses it. And for now, you can just think of Ginga as being the library that knows how to interpolate variables inside of pairs of curly braces. So why are we introducing yet another frame, another library? of all the folks who implemented Flask decided that it was not worth their time reinventing the wheel of a templating language, a language via which you can figure out what values to plug in where. So they just lean on another library that someone else wrote years prior so as to not reinvent that wheel themselves. And that's all that's going on with a framework. In this case, it's using perhaps multiple libraries instead. All right. So what then is a template? So this then is a template. What you're looking at here, hello, placeholder, is a template in the sense that it's kind of the blueprint for the web page I want the user to see, but it's going to be dynamically generated using indeed this blueprint by plugging in the value of placeholder inside of those pairs of curly braces. And so that's why index.html starting today is in a folder called templates because this is not just static HTML like the stuff we wrote last week. This is the uh the the the blueprint for the actual HTML that we want the browser to spit out. But there's a bug here. Notice what's going to happen here. If I go up to this URL and I get rid of the name altogether, for instance, I just visit the slash route without any key value pairs and hit enter. This is sort of bad bad request. It's an HTTP 400. In fact, if you look at the tab, here's another HTTP status code that we probably haven't seen before. But 400 just means the user did something wrong by not passing in the parameter that was expected. Well, that's a little bad design if like the user has to manually type in things to the URLs. Like no human actually does that. That's not good for business or customers in general. So I can go back into app.py and just make a little bit of conditional code here. And here's too where we see what makes this an application and not just a static page. Instead of just blindly getting the name here, I could instead do something like this. Well, if the name parameter is in request.orgs, and this is just Python syntax for asking if this key is in this dictionary, then I'm going to go ahead and define name and set it equal to request.orgs quote unquote name. Else, if there is no name in the request, well, then I might as well give some default value like name equals quote unquote world. And that alone logically makes sure that I only try to access request.org's name if the key is actually there. So, if I go back to the browser now, reload without anything else in the URL. Now, we're back in business and it's saying hello, world. But if I go up to the URL bar and add name equals David, enter, that too now works. So, it's a web application in the sense that not only does it have function calls as well as a variable, but now we've got some conditional logic with boolean expressions as well. All right, questions on anything we've done thus far because it was a lot all at once. Questions thus far? Yeah. >> Good question. Let's try that. What if I just did question mark name equals nothing? Well, let me go back to that other tab. Uh, delete the name David and hit enter. And I indeed see hello, nothing. Why? Because the name key is provided now. It just doesn't have a value. And so the conditional has the same answer. Well, yes, name is in request.orgs, but there's just no value associated with it. And here again is the value or a hint at the value of using a framework like flask. The fact that I can just import the request global variable and then ask questions like is this parameter in this dictionary means I don't have to write any of the code that like figures out what the URL looks like, break it apart between the question mark and the equal signs and any amperands therein. That's all sort of generic logic that every web application has to do. So again, Flask is sort of doing that lift for me and I can just focus on the logic that I actually care about. All right. Well, a quick convention here. It's I've used the word placeholder here just to kind of hit the nail on the head and make clear this is a placeholder, but frankly it's a little more readable stylistically to not just put hello generic placeholder, but to say something like hello, name so that a colleague or even myself looking at this file down the line knows that okay, we're trying to print out the user's name here. That's fine. You can change the name of these variables to be anything you want. And even though it looks weird, it's conventional in Flask to do something like this. Name equals name. But each of these names means something different. This is the name of the placeholder that I'm going to put in my actual template. This is the value that I actually want to give it. And it just keeps me a little ser by just reusing the same name instead of calling it placeholder or placeholder 1, placeholder 2, placeholder 3, or something generic like that. Now it's just a little clear even though it looks weird to say name equals name. Again, that just allows me to do this in my template. All right. Well, what more can I do after that? Well, let me propose that we can actually go in and simplify this code a little bit. It turns out this is so common to just ask a question as to whether the parameter is there and then do something with it or not that flask comes with some logic to do this. And in fact, I can get rid of all four of these lines. Just go ahead and with confidence declare a variable called name, set it equal to request.orgs, arcs, but in the so-called dictionary, use a function called get that comes with it, which technically doesn't relate to the verb that was used by HTTP. This just means literally get me the following. And if you want to get the parameter called name, you literally just say quote unquote name. However, in case there is no name parameter, you can also give this function a default value like world. And so now we've collapsed into four lines uh from four lines into one that exact same logic. So this gets me the HTTP parameter called name. But if it's not there, it gives me a default value of world. So that no matter what, this name variable has what I care about. Indeed, if I go back over here, let's type in how about name equals David again. Enter. That's there. If I type in uh no name, enter. That too is now working as well. All right. Well, let's see if we can refine this a bit more. Let me propose that in our next version of this. Let's introduce a second route. So two URLs. Much like uh Google has many different URLs as does most any web application. At the moment, I'm doing everything in my slash route. So how might I move away from this? Well, let me go ahead and not only add a second route, but an actual form via which the user can type in their their name. So to do this, let me propose that in index.html, HTML. Instead of just printing out the user's name and trusting that they're going to have typed their name in manually to the URL, which again is not normal behavior, let's actually show the user a form via which they can do exactly that. So here's my form tag. Uh let's say the method I'm going to use is get so that I see everything in the URL. Let's give myself an input uh that whose name is name because this is the human's name. And notice somewhat confusingly, this name on the left is the HTTP, sorry, this name on the left is the HTML attribute that we saw last week. So, it's different from what we just did in Python, even though they're all called the same thing. The type of this input is going to be text. And let's go ahead and make this a little more user friendly. Let's put some placeholder text called name, so the human knows what what to type in. Let's go ahead and disable autocomplete just so we don't see previous input into this text box. And let's autofocus it so that the cursor is blinking in the text box by default. Then lastly, let's go ahead and have a button the type of which is submit. So that clicking this button actually submits the form. And I'm just going to call this button like greet because I want the user to be able to greet themselves by clicking this button. Now I should specify action. The only other time we used action is when we actually went to httpsw.google.com/ google.com/arch that's not relevant today because I'm trying to print hello world not search for cats and such but this is where I too have control if I want to submit this form to a specific location on in my web application action is where I can specify it so why don't I pretend that there exists a route in my application called /greet and if you go to example.com/greet question mark name equals David this now will greet the user with hello David for instance, but slashgreet does not exist. If we go back to app.py, literally the only route that currently exists is single slash, but I can change that. I can go into my uh app.py as I have here and below this function, I can go ahead and define app.rout quote unquote /greet and just invent any route that I want. I can then define a function that will be called whenever that route is visited. By convention, to keep myself sane, I'm going to call the function the same thing as the route, but you don't have to do this. It's just to minimize uh decisions I have to make. And then in this function, what I'm going to do is this. Return render template greet.html, which doesn't exist yet, but that's a problem to be solved. And then I can pass in the name of the user. I'm going to go ahead and save myself a line of code and just say request.orgs.get quote unquote name, world. In other words, strictly speaking, I don't need that variable on its own line. This has the effect of what we already did in index, but I'm doing it all in one elegant oneliner. And now in index, in so far as I want the index of the site to just show the user the form via which they can type in their name, this one's easy now. Render template quote unquote index.html and return that template. So to recap, here's index.html, HTML which is now a form instead of a template for hello, such and such. App.py is going to return that template whenever I visit the index or slash of the page. And then this greet route is going to handle the case of printing out greet.html passing in the user's name. All right, I think I'm not quite good to go yet, but let's try this out. Let me go back to my browser tab, reload, and there we have it. I have a web form now instead of the uh the hello, soando, I'm going to go ahead and type in my name. And notice the URL at the moment, even though Chrome is hiding it, technically it's there slash, but Chrome and most browsers today sort of hide as much stuff as they can if it's not all that intellectually interesting. But watch what happens when I click greet to the URL. It automatically sends me to /Greet question mark name equals David. And this is just like the way the forms worked last week when we recreated our own version of Google in search.html because the action there was google.com/arch. The user was whisked away to Google server. Today I stay on the same server because the action I used was quite simply slashgree which is assumed to be on my own server. But clearly I screwed something up because I have a big internal server error in front of me as you soon will too. Odds are as you dive into this uh 500 is the status code that means your fault somehow. Now why is that? Well, it's unclear from this generic black and white message. However, because I'm the developer, I can go back to VS Code, open my terminal window, and recall that I have two terminals open now. One that I can type stuff in, the other of which is still running from before. Let me open up that one. And you'll see if I maximize my terminal window, a whole bunch of scary error messages here. But the relevant one is probably going to be, let's see, down here. Race template not found error. Ginga exceptions template not found. Greet.h. html. So there's a lot of esoteric error messages here, more so than usual, but the simple fact is that I just screwed up and I did not create greet.html. So file not found by the server. So the user doesn't see all that complexity. That's deliberate by design. It's generally not good for cyber security. if you're revealing to the user all of the error messages that are happening on your server because maybe that suggests they can hack in some way some way by taking advantage of those error messages and the information implicit in them. But they are there in your terminal window to actually see and diagnose. So how do I fix this? Well, not a problem. Let me shrink my terminal window back down. Let me code a file called greet.html. And in greet.html, let's create the template via which I'm going to greet the user, which ironically is the exact same as index.html HTML used to be. So, let me recreate that real quick. Uh, doc type HTML. Let me close my terminal. HTML lang equals en uh head uh title hello body hello, and there's my uh here's my placeholder hello, name. So, to be clear, the index.html template doesn't have any curly braces or anything dynamic. It just spits out the HTML for the form. Greek.html HTML spits out HTML and the actual greeting. And it's app.py that decides which of these to show the user. Either index.html if they visit the slash route or greet.html if they somehow find their way to the /greet route, which they will automatically by simply submitting that form. All right, so let's go back into this internal server error and go back to the form. Nothing has changed with the form, but now when I type in David click greet, not only will the URL change to be slashgreet question mark name equals David, I actually now see the content that I expected a moment ago. All right. Well, now it's a opportunity to critique. I have these two templates open, index.html and greet.html. And even if you've never done web programming before and even if you've never did HTML before last week, what is bad about this design intuitively? >> Say again. >> Abstraction. >> Abstraction in what sense? >> Yes. So that's exactly the the hangup I have here. There's a lot of duplication. And technically I didn't copy paste though I might as well have because notice as I very hintingly go back and forth almost every line of code in these files is the same except for the form which is there or not there or the hello comma like all of the boilerplate HTML namely everything I just highlighted here lines one through seven in greet.html HTML and this and this is what we really start to mean about a template. Like wouldn't it be nice if we could factor out all of that HTML that's common to both files, put it in literally a template that both routes can use so that I can write that boilerplate code once instead of again and again. Cuz imagine in your mind's eye, well, if I have three routes or four routes or five routes, I'm going to be like typing the same darn HTML three, four, five times. That's got to be dumb and that's got to be solvable as we've seen in other languages as well. So, let me indeed go ahead and try to improve this. And the syntax is a little weird, but it's the kind of thing you get used to quite quickly. I'm going to go ahead and create a third HTML file now by going back to my terminal window inside still my templates directory. And by convention, this file is going to be called layout.html. Why this? That's what the flask documentation tells you to do. So, in layout.html, HTML. I can pull all of my boilerplate HTML, the stuff that is invariant and doesn't change. So, here we go. Doc type HTML uh HTML tag lang equals en close bracket open bracket head open bracket title. We'll call it hello for all of the pages. Open bracket body. And here's where it gets interesting. The body is the only thing that has been changing in these two examples. In index.html, it was a web form. In greet.html, HTML. It was just a simple string of hello, so and so. So, what I want to tell Flask is that everything in the body will just be a dynamic block of code. And the syntax for that, which takes a little bit getting used to, but it's also sort of copy-pasteable. Block body using percent signs this time. And because I don't want any such body in the template, I'm going to literally close this block as follows. And here you see another example of sort of HTML like syntax but instead of using angled brackets, Ginga uh the templating library that Flask uses uses curly brace and percent sign to open the tag and then the opposite to close it. So what you really have here are two Ginga tags as we'll call them. This one is called block and I'm defining an arbitrary name here. I could have called it foo bar or baz but because I want this block to refer to the body of the page by convention I'm going to call it body. And then this weird syntax which is used in some other languages too just means end whatever block you just began. And so again you just see reasonable people disagreeing. The people who invented HTML use nice angled brackets and words like these. The people who came up with ginger used curly braces and percent signs. Why? Well, odds are these are not normal symbols that a human would type when writing uh code, at least in HTML. So they just chose something that probably wouldn't collide with actual syntax the human wants to use. So that's it for the template. This is now a uh this is essentially a blueprint that doesn't have just a placeholder for a single word or value like name. I can put a whole chunk of code here now instead. And how do I do that? Well, let me go into index.html with the moment which at the moment is a little duplicative in that it's got all of this boilerplate. So you know what? I'm going to go ahead and delete everything that is already in my layout both above and below that web form. And now I'm going to use a bit more ginger syntax. This too takes a little while to memorize or copy paste. But if I want index.html to use the layout.html blueprint, I can simply say extends layout.html and then close tag using percent sign close bracket here. And then if what I want to plug into that layout is the following code, I can say as before block uh body and then down here I can say end block. And that's it. And just to be a little nitpicky, I'm going to de-indent that slightly. And now even though it looks like web pages suddenly look a lot uglier. Well, they do because like this is weird looking syntax, but I have now distilled index.html into its essence. This is the only thing that changes visav the greeting page. And so I've put my HTML here that I care about. I've said to Flask, this is what index.html's body block shall be. Where to put it? Well, put it into that particular layout.html file. And so the logic for greet.html is the same thing. It's going to look just as weird, but again, you get used to it. Let's go ahead and delete everything that's boilerplate in greet.html, both above and below. up at the top. Let's tell Flask that greet.html 2 extends layout.html. And let's go ahead and say to Flask that the block uh called body shall be this for greet.html. And the end of this block is now down here. And just to be nitpicky, I'll de-indent that too. So again, the pages look a little weirder now, but it's going to follow a paradigm that we just see again and again, such that the only juicy stuff is what's inside of that body block. So now, if I go back to my layout, it looks exactly like this. This indeed is a placeholder, not just for a single variable like name or the placeholder we did before. This is the placeholder for a whole block of code that came from a file, not from a variable. And so if I go back into my other tab here, go click back to go back to the web form and reload. Notice that I have the familiar looking form. But if I now look at my developer or if I look at view page source, notice everything that came from the web page from the server. Here's that boiler plate up here. Here's that boiler plate down here. And here's the stuff that's unique to this page. And recall too, aesthetically I de-indented it, which is why it's now no longer pretty printed in what the browser sees. Like that's okay. There's no reason to obsess over the indentation and the pretty printing of what the browser sees. Ultimately, the reason I did this indentation is because arguably when I'm in VS Code here and I look at index.html, this is clearly indented inside of the body block just so I know what's part of that block. The browser does not care about superfluous whites space or less thereof. All right, questions on what we've just done here, which is to truly take this template out for a spin and now remove what redundancies I had accidentally introduced. Questions? No. Okay. Amazing. All right. Well, let's go ahead and look at this URL again. I'm not liking the fact that every example we've done thus far involves putting my name or Kelly's name right there in the URL bar. Well, why is that? Well, if I have like a nosy sibling and they sit down at my browser, they're going to see like every URL I visited, including whose name was greeted. Now, that's not all that big a deal, but now imagine it's a username and a password that the form is submitting or a credit card number that the form is submitting or just search terms that you don't want the world knowing you're searching for. They're going to end up in the URL bar. Why? If you are using method equals get for the form, that's how get works. It literally puts all of the HTTP parameters in the URL, which is wonderfully useful if it's sort of uh low stake stuff like the Google search box or if it is um or potentially low stake stuff like the Google search box or if you just want to be able to hyperlink directly to a URL like this. In other words, if I put this into an anchor tag open bracket a href and a URL like this, I could deep link a user to a web page that just always says hello, David. So get strings contain all of the requisite information to render a page for the user. But this isn't really good for privacy. So recall that there's not only get, but there's also something called post. And post is just a different HTTP verb that essentially with respect to those virtual envelopes next last week sort of puts the information more deeply inside of the envelope such that it's not written right there in the URL bar, but it's still accessible by the server. So if I do this, watch what happens. Let me uh go back into VS Code. Let me go back into index.html which has the form. And let me quite simply change the method from get to post. And now let me go back to my other browser tab. Back to the form and reload so that the form knows that the method has changed. Now type in David and click greet. And before I do that, let me zoom in on the URL bar. Notice that the URL does change. I'm at slashgreet, but I haven't revealed to the world or to anyone with physical access to my browser what URL I just searched for. All they know is that I went to /greet, but not the key value pair or pairs that were passed in. Of course, this clearly hasn't worked. I've got an HTTP status code of 405, which means method not allowed. That's because flask by default when defining routes simply assumes that you want get instead of post. Now, get is good for the default page. In fact, when I go back here, this is equivalent to me visiting the slash route just in the browser. So, I want my index to generally support get, but the greet route should support post. And the simplest way to do this is to pass in another argument to the route function, which we haven't needed before because the default is get. And I can instead tell flask a commaepparated list of the HTTP methods that I want this route to support. So if I wanted to support just post, I can pass in a list containing just post. And recall FL uh Python uses square brackets for lists, which are their version of arrays in C. Now by default, this argument is this methods equals get. And that's why the only thing supported a moment ago was get. That's why I'm now changing it to be post instead. I have to make one other change though. It turns out if you read the documentation when accessing HTTP parameters via post instead of get you move from using request.orgs to request form. This is completely unintuitive that request.orgs is get and request.form is post because they all come from forms. So it's bad naming admittedly. So you just kind of have to remember request.orgs is used for get. Request form is used for post. So all I need to do further is change this to be request.form and that's it. Now my web application will support web form submitting to it via post instead of get. Let me go ahead and type in my name. Now I'll zoom in. Notice that the URL will again change to /greet with no parameters evident. But I will be greeted this time because the server knew to look deeper into that envelope for those key value pairs instead. And just to be now uh sort of diagnostic about this, let me go back once more. Let me rightclick or control-click on my desktop and go to inspect. Here's where developer tools can be super useful as well. I'm going to go in here and I'm going to go ahead and clear this. And now I'm going to type in David again and I'm going to click greet. But because I have the network tab open like we played with last week, it's going to show me all of the requests going from my browser to server, which is going to be useful here because not only do I see, okay, it obviously worked because I got back a 200, but if I click on this diagnostic output, I can actually go to the payload tab here and I'll see that the form data that was submitted was name, the value of which was David. So you can see what you're submitting. So you can do this today like if you want to log into some website uh Gmail or otherwise you can actually see all of the data that your own keyboard is submitting to the server even if it's using post because the browser that you control of course can see the same there. All right, any questions now on this transition from get to post kind of on a roll or not going so well. We'll see. All right, so what more can we do with this? Well, let's give ourselves a couple more building blocks before we transition to actually implementing some real world problems as I did years ago with one such example. Suppose that I don't like this direction I'm going in in so far as every time I have a page with a form, it submits to another route altogether. Cuz in your mind's eye, just kind of extrapolate. Well, if I have two forms on my page, I now need four routes. If I have three forms, I need six routes. It seems a little annoying that you use one route just to show the form and another route to process the form. This is going to get annoying over time because it's like twice as many routes as might be ideal. So, is there a way to get kind of the best of both worlds and combine these two routes into one so that everything related to greeting the user all happens in one place? Well, you can as follows. What I'm going to go ahead and do is delete my greet route al together and most of my index route. But I'm going to ask a question. I'm going to first say that the methods that the index route support now shall be both get and post as a commaepparated list there. And then inside of my index route I can simply ask a question of the form if the request that is submitted to the server has a method of post then assume that form was submitted. This is just a Python comment note to self that I'm going to come back to in a moment. else if the request method is not post. So I could technically say if l if uh l if request method equals equals get then but this is kind of dumb because I only support two verbs. So I might as well just assume for efficiency else handles the get implicitly then go ahead and assume that no form was submitted. So show form. So just notes to self as to what I want to do. So how do I show the form? Well this line was easy. return render template of index.html. If though the form was submitted, what do I want to do? Well, just as before, let's return render template greet.html passing in a name value of request.form.get quote unquote name else a default value of world. So, the exact same logic from each of the two functions a moment ago, but I've now combined them into one by just using some conditional logic and just asking the server if the user got here via post, well, the only way they could have gotten here via post is by having clicked that button and submitted the form. So, let's just go ahead and greet them. Else, if they got here via get by just typing in example.com or whatever the actual URL is, let's go ahead and show them the template. So, it's still good design in that I have a separate template for each of these pieces of functionality that is only minimally different, but I'm sort of deciding which of those to show based on the actual logic in this here app. All right, so this is almost perfect except for one bug. What else needs to change if I've just combined my greet route and this default slash route as well? Yeah. Yeah. So, in the form that has index.html, recall that there's an action line that specifies like to what URL do you want to submit this? Well, let me go back to index.html. It can't be /greet anymore because that doesn't exist. So, I'm just going to delete the word greet and submit it to slash instead, which will have the effect of also just omitting it entirely. If you don't specify an action, it submits to the very location that it came from. But if you want to be pedantic and even more clear, just specifying that the action now of this form is just this, then that will work here, too. All right, so let's test it. Let's go back to the other tab. Back to the form, reload. It's blank now. I type in David. Click greet. And this two is working. But again, if I go back and reload, get is working as well. But there's nothing ending up in the URL because I'm now using post, which again tends to be a good thing for privacy reasons as well. Let me show one final flourish before we transition to something realworld motivated. If I go into app.py, for a while now, I've been passing in this default value of world, which is fine, especially if it's something short and sweet. That's the default value. But I can actually put a bit of conditional logic in my template as well. So, in fact, let me go into greet.html HTML and trust that I will now be passed in a name variable. But I can decide for myself in the template whether I want to say hello name or if it's blank hello world instead. And how might I do this? Well, I can always say hello, but then I'm going to use some Ginga syntax that we haven't seen yet. But it turns out in Ginga, the templating language that Flask uses, you can use Python-like syntax too. And you can ask questions like well if uh the name variable has a value well then go ahead and output the value of that name. Else if the name variable does not have a value go ahead and output a literal value like world. Uh and then down here end if. So ginger again is a little weird in that it says end block end if but that's the way it is. But even though this looks a little weird, it's just a nice clever way of putting a bit of logic into my template. And if the name has a value, so it's not empty or none, go ahead and display it. Hence the curly braces. Else go ahead and literally say world. Why is it not problematic? And you can see the dots here that there's all of this white space after the word hello, like otherwise this would seem to create quite a messy paragraph or phrase of text in terms of whites space. But >> HTML ignore ignores superfluous whites space. So anything more than a single space just gets canonicalized or collapsed into a single space. And we saw that recall last week accidentally when I had those three paragraphs of of text uh from uh from the duck, but I wanted them deliberately to be separate paragraphs and they weren't because all of that white space was ignored until I actually introduced the uh paragraph tag instead. So this just moves some of that logic. now to the templates. So for all this logic and more, here's the official documentation for Flask and specifically Ginga's own documentation, but for the most part, we've seen what's possible already. And I promised a real world example. So here now it is. So uh back when I took CS50 as a sophomore, there was no web programming in the class. And frankly, there was barely any web actually in the world because it was all so new HTML and the like. But uh it was my sophomore, spring maybe or junior fall that I also got involved in the freshman inter mural sports program or frost IM's for short. And back in the day uh we would walk from say Matthews Hall to Wigglesworth uh freshman year at least to register for sports by filling out what was called a sheet of paper and then you would go to the proctor's dorm room and slide it like under their door or through the mail slot and that's how we registered for sports. It was sort of ripe for disruption before that was even a phrase. And so one of the very first projects I took on myself personally after taking CS50 was to figure out how web programming worked. And Python wasn't really a wasn't a thing yet uh nor was half of the topics we've been talking about thus far. But at the time I learned a programming language called Pearl. I learned a little something about CSV files which we did a couple of weeks back too. And I built this the freshman intramural sports website via which you could click on a bunch of links and get some information. But most importantly, you could register for sports as by typing in your name, selecting the sport for which you want to register, click submit, and no longer walk across Harvard Yard with a piece of paper to actually register for sports. So, we thought we'd use this as sort of the beginning of a motivation for how we can now solve problems using web- based interfaces using code. Um, and also what not to do, like background images that repeat like this are not really in fashion anymore, nor arguably in 1997. Um but let's leave that as a cliffhanger and come back in 10 minutes after a snack with re-implementing the frost IM's website. All right, we are back. So among the goals now are to recreate the beginnings of a site like this for frost IMS whereby we want to enable students to uh visit a form, fill out that form and submit it to a server and then register. And we'll dispense with all of the amazing graphics and such and keep it fairly simplistic and core HTML. So let's go ahead and do this. Back here in VS Code, I've gotten ready now for this next set of examples. And in particular, I've created in advance a directory called frost im.py, requirements.ext, and templates, which are essentially the same as the ones we just created, but I stripped out the hello and greeting specific stuff. I'm going to go ahead in this terminal and do flask run. So, I get the server up and running again on port 5000. And then I'm going to go ahead and open up another terminal here as I did before. cd into frost ims in that terminal where I'll see the exact same files and I'll give you a quick tour of what I created in advance. So here in app.py is quite simply the simplest of applications that just renders the index.html template with an expectation in a moment that we're going to make it more interesting than that. Meanwhile, if I open my temp uh my terminal again and open up requirements.txt, it just mentions flask, but it's already installed. So no more to say about that for now. Now, let me go ahead lastly and open up templates, uh, the templates folder. Two files there in the first of which is layout.html, which looks almost the same, except I did add a slightly more userfriendly tag to the head of the page, which you might not have seen before, but this is a tag that essentially you can copy and paste into templates of your own that help the content of a page resize to be mobile friendly. In fact, without this line, if you were to develop problem set 9 or your final project for the web and then try to access the site on a phone, everything might look quite a bit too small, font sizes and more, this line tends to help the browsers resize dynamically so that it actually matches the width of the devices own width. For instance, a phone versus a laptop or desktop. But otherwise, everything else is the same there, including the placeholder for the body block that I've defined here on line 9. Lastly, there's one more file that at the moment doesn't do anything all that interesting except is ready to contain the contents of the registration form for frost IM. So, let's go ahead and start with actually that. Let me quickly whip up a form that minimally gives the user something that they can submit to the server to register for sports and then we'll improve upon it a bit iteratively. So, here inside of the body of index.html, html which is going to extend the actual layout, the blueprint we already created. I'm going to have a quick title for the page like register just to make clear to the student what they need to do using the H1 which is the big and bold tag. Then I'm going to go ahead and have a form tag uh whose uh action is going to be anything I want, but since I want the user to register, I'm going to have it go to slashregister, which makes more sense semantically than greet now because we're doing something else. The method I'm going to have the student use is post, if only because they don't want their roommates knowing what they visited in their browser. So this way it will tuck the HTTP parameters deeper in that virtual envelope so it's not stored in the browser's history. Inside of this form, I'm going to have minimally an input box for the student's name. So I'll call that aptly name and set name equal to name in my HTML. The type of this text box will be exactly that text. And then just to make it a little more user friendly, I'm going to add a placeholder of name so they know what to do. I'm going to go ahead and uh turn off autocomplete in case multiple roommates want to uh sign in from the same computer, register from the same computer. And then we'll turn on autofocus to put the cursor in that name box. And then, and you didn't see this last week, but if you've ever wondered how drop-own menus are implemented in HTML, if you've never done this yourself, those drop-own menus on web pages are called select menus. And if I want the user to select a sport to register for, I'm going to call this input a uh sport. And this is an alternative to just having a generic text box where we have the students type in the sport they want to register for which would be fraught with typographical errors and changes in capitalization. A drop-own menu of course standardizes what the human can select. So inside of this dropdown I'm going to have a few options. uh the first of which uh will be uh basketball for instance, the second of which will be soccer and the third of which I think was the first three with which we debuted back in the day was ultimate frisbee. Now these option tags can take some attributes. Uh by default they will take on the value of whatever words are typed in between the open and close tags. But just to be pedantic I'm going to make clear that the value of selecting this option shall be basketball. But I could change it to be something else if I so chose. The value of this selection will be soccer and the value of this last option will be ultimate frisbee just in case I want to store something else in my database ultimately. Now that is a complete index.html I think. So if I go back to uh my browser tab which previously was showing me the hello program because I stopped and restarted Flask and you can stop flask by just hitting C uh for interrupting it. I'm going to reload the page and I should now see okay a slightly more interesting form with a name box with the uh cursor is blinking there and then a select menu a dropown with three options. Now it's a little presumptuous of me to select basketball by default and in fact this is kind of inviting user error if they type in their name don't really think about it and now register for basketball accidentally. So I'm going to make a couple of improvements here. I'm actually gonna have essentially a blank option at the top whose value is nothing and I'm gonna have it just labeled sport. And just to be super clear, I'm going to select this value by default. So the option tag in HTML supports not only a value attribute, but it turns out a selected attribute, which if present means that's the option that will be selected by default. So if we go back now to this page and reload to get a new copy of the HTML, looks a little better. I still have the name at left, but the sport now menu looks like this. So, it's a little more clear what I want them to do from this dropdown. And sport deliberately on the back end won't have a value. And theoretically, this will help me determine if they actually selected a sport or just clicked register and ignored the drop down still. But I do need a way for them to register ideally by clicking a button. So, I'm going to add a button, the type of which is submit. And then I'm going to have this button's label be register. So now if I go back to the form once more, reload, I now have I think a complete form, albeit not very pretty, via which David can register, for instance, for basketball by clicking register. And ah darn it, I have a 404 not found. But why is that? Why is nothing yet found? Why is slashregister not found? Yeah, >> what's that? >> I haven't Well, I haven't linked the option to anything. I think the form has been linked. Whoops. The form is telling the browser to go to slregister. So, this is correct behavior. But if we go to app.py, like there's no route defined for slregister. So, of course, it's not found because there's an infinite number of routes that don't exist and register is currently among those. So, I can define that myself. I can say app.root quote unquote register. Uh, I do want to use post. So I need to proactively say that the methods this uh function will support will be indeed post instead of the default of get. I'm going to define an actual function to call when this route is used. And by convention I'm going to call it just register even though I could call it anything I want. And inside of my register function, well for now I'm going to cheat a little bit. I'm going to at least just say uh I'm going to at least check that the user has given me a name and a sport. So how can I express this? Well, because I have already imported the request global variable that comes with flask, I can ask questions of it. And I can say something like if it is not the case that request.form.getame has a value or if it's the case that or if it's not the case that request.form.getport has a value, then let's go ahead and give the user uh a warning of sorts. I'll return render template of a file called failure.html. This doesn't exist yet, but no big deal. Let me go back into my terminal. Let me uh go into templates and create a file called failure.html. And in this file, I'm going to say that it extends uh layout.html.html. And then it has a block body inside of which is going to be something like super trivial for now, just to get us going. And this failure page is simply going to say you are not registered exclamation point and then end block. So that's it. Just sort of an error page that now exists. I'm going to close it out of sight, out of mind. But I think this now will work. If it is not the case that the user gave us a name or it's not the case that the user gave us a sport, we will show this error message. Otherwise, if all seems to be well, for now, we're not going to do anything useful with the information, but I'm going to go ahead and return render template of success.html, which is simply going to assume that the user was successfully registered. So, let's whip that up quickly. Uh, I'm going to go ahead and code up success.html inside of this file, which will similarly extend uh layout.html inside of which there's a body block that quite simply says, "How about you are registered?" and we'll just pretend that it is so and block. So that's it. In short, I want the two templates that show failure or success respectively. So I think now in app.py, we're in better shape. I now have a register route that will get called if post is used to visit it. And I'm going to check request.form, which is where you get the post variables from. Check whether name or sport is provided. And I'm going to render a template accordingly. So let's try this. Let me go back to my other tab and go back to the form. Let me type in my name, David, but no sport. Click register, and I have an internal server error, which was not intended. So, let's figure out how to diagnose this. So, it seems to be the case that I'm at /register. That was intended, but something clearly went wrong. So, let's go back. Now, I could just kind of stare at my code endlessly, but recall that there should be some hints in my terminal window that's running Flask. So, let me go back to my other terminal, and there it is. Unexpected char double quote at line 11. Well, look, sounds like user error. So, that is in failure.html. And you can kind of see it because Flask is like underlining it literally for me. What did I do that was stupid? Yeah, I just didn't close my quote. So, amateur hour here. So, let me go into I do need to open it after all, ironically. So, let's go ahead in my other terminal, open up failure.html. And there it is. One stupid character away from correctness. All right, let's close this again. Go back to the other tab. Let's try this again. David as my name but no sport. Register. Okay, you are not registered. I don't know why, but I know I'm not registered. Let's try it again with a name. Uh with no name, but yes, a sport. Click register. You are not registered. All right, just for good measure, let's give no name and no sport. You are not registered. So, that seems to be working. Let's now cooperate. Let's go ahead and register as David for basketball. Cross my fingers. Damn it. And internal server error. Let's try to learn from my past mistakes. Let's open up this eyeball it. I did it twice even though that was not copy paste. So 0 for two. All right, let's go back here. Notice now I can actually just click reload because the browser is smart enough to remember what I just posted to the server. So if I click reload, you'll be prompted to confirm the form submission less you be doing this on a website with your credit card or something where you don't want to send it twice. But in this case, I'm fine with sending my name and basketball twice. So I'm going to click continue. And this time it worked telling me that I'm actually registered. So I'm not doing anything with the students data, but at least I am validating that they gave me some input. Now there's a catch here. The catch of course with HTML is that it's all executed s client side. And so for instance, suppose that a student is really upset that we only offer basketball, soccer, and ultimate frisbee. And maybe they really want to register for volleyball even though we're not offering volleyball. Well, there's arguably like a security vulnerability here where technically my code right now will tolerate any user input even if it's not in that dropdown because after all, let me go ahead and rightclick or control-click on my web page and open up the developer tools. Let me go into the form as sort of a hacker type student. Let me go into the select menu and okay, no big deal. If I want uh ultimate frisbee to exist, well, I just need to know a little HTML. I'm going to rightclick on that element and click edit as HTML. This literally lets me start editing the HTML of the page. I'm going to give myself my own option. Option value equals volleyball. Close bracket volleyball. Uh, enter. And now when I close developer tools, woohoo, I can register for volleyball if I want. So let's select volleyball. Type in maybe Kelly is hacking the site. Register. And she is registered for volleyball apparently. All right. So the short answer is the short the takeaway here is do not trust user input ever for reasons we've already seen when we discuss SQL ever more so now that we're dealing with the web because who knows what users are going to do accidentally foolishly or even in Kelly's case here maliciously trying to pass data that we did not expect. So what would be the defense against this? Like this is just how HTML works and assume that I'm actually registering Kelly for sports now and somehow she's now signed up for volleyball in our database. What would a solution be logically here? Yeah. >> Yeah. So maybe do some server side validation. So don't just blindly check that we have a value from the user. Actually check that it's one of those sports. So if I go back to app.py, I could do this in a few ways. And maybe my first instinct would be this. Let's check for the name and do this. But let's also do this. Like if request form.get get quote unquote uh sport. And actually, let's put this in a variable just to make it even easier to type. So, sport equals this. If sport uh how about does not equal uh what was it? Basket ball and sport does not equal uh soccer and sport does not equal quote unquote ultimate frisbee, then render an error. So, uh, return render template quote unquote failure.html. So, now if I go back to this form and try to register as Kelly again, you are not registered. So, I somehow caught her because volleyball of course is not in the list of sports that I put there. But what might you not like about this approach? Even if you've never done web stuff before, what's bad about this? >> Yeah, I have to hardcode every single sport now in not only app.py PI to check for the validity on the server of what the humanness has typed in. But recall that the drop down itself came from index.html. So I now in duplicate have to put like all of the sports there too. So like this just seems bad to have duplication. And so better might be to do something more like this at the top of my file here. Why don't I go ahead and just give myself a global variable which in the context of this web app is perfectly reasonable. So I can access it anywhere. Let's call it sports in all caps just to note that this is a global variable in constant. Even though Python does not have consts in the sense that C does, but this is sort of on the honor system. If you see a variable in all caps like this, just don't mess with it. Use it, but don't mess with it. So, uh, inside of the square brackets, this is going to be a list of the sports that I do want to support. So, basket ball, uh, soccer, ultimate frisbee, and that's it. Now, instead of doing all of this, what I can instead ask is a simpler question like this. If sport not in sports, then go ahead and return render template quote unquote failure.html. And I can actually tighten this up a little bit. I don't need two calls to failure.html. Why don't I just borrow this code and say or uh sport not in sports render a failure. And now I've tightened this up quite a bit more, but I'm essentially using Python to just ask is the sport that Oops, sorry, I deleted too much. Sport equals actually, let's just tighten it up further. Sport does not exist. So let's do request.form.get quote unquote sport. So if the sport that the human typed in or selected from the drop down somehow is not in this global list of possible sports, well then it's a failure. Don't let Kelly or whoever register instead. But if I now have this global variable, I can be a bit smarter in my template. I don't need to manually write out all three of these sports here. Instead, I think I can be smart about this. And when I render index.html itself, why don't I just pass in a variable called sports for instance, set it equal to the value of that global array. And then in my template, and here's where templating again gets interesting and starts to save you time. Let me go into index.html, HTML delete all but the se default value the blank one and do something like this. Ginger it turns out also supports loops like Python for sports in sports using the curly braces and the percent signs. I can now dynamically generate options as many as I want. So option value equals quote unquote the current sport close uh quote there close bracket sport. So it's a little redundant but again this is just how HTML is. This is what the human sees. This is the value that gets submitted to the server in case you want one to differ from the other. And then below that option line, I can say end for which is a bit weird, but that's how it works in Ginga to stop that loop. So this is kind of powerful. Now if I have three sports, 30 sports, all of the options will be dynamically generated by this template. And so now we're starting to save ourselves time and I can centrally manage all the sports by just updating this global list here in app.py. So, let's go back to the browser, uh, back to the form, reload, and you'll see that the drop-down thankfully still works the same way, but all of those options were dynamically generated. Indeed, if I view page source from my browser, you'll see, and there's some extra whites space there because the loop was adding some whites space on each iteration, I still have the three sports, but not volleyball, as was my intention. So now if uh if Kelly even tries hacking this version of the site by going in here and select and typing in volleyball manually registering the logic will still catch it because only those three sports are in that array. So it's perfectly fine for me now to register for basketball because it's among the sports sorry in that list not array questions on any of these here techniques. All right how about another type of form? So, select menus are nice, but you also might see radio buttons on websites, which are the mutually exclusive little circles that you can select to choose one or another option. Uh, let me go back to index.html and just show you how those can be created as well. Instead of using a select menu, turns out we can create a whole bunch of inputs uh of radio type type as follows uh as of radio button type as follows. for each sport. So for sport in sports, let's go ahead and output in between this tag and the N4 the following input type equals radio uh and let's give it a name. The name of this radio box is going radio uh button is going to be sport and the value of the current input is going to be quote unquote sport. And the word that the human's going to see is as before sport. So notice it's just another type of input. Previously we've seen text for instance two lines above. We also saw last time search. We saw email. There's a bunch of text input types. This one though is going to display as a radio button instead. And the human is going to see this label here. If I now go back to my other browser tab and click back, click reload on the form. I should see it's not pretty, but it's a radio button in the sense that these are mutually exclusive. How does the browser know that I should only be allowed to select one of them? Well, because I use the same name for each of those radio buttons. It knows that means mutual exclusivity. In fact, if I view page source in the browser, you'll see that all three of the inputs that were dynamically generated, type equals radio, type equals radio, type equals radio, also have identical names. And so that's just how that works. And that's the only change necessary. If I now go ahead and type in my name, David Basketball, click register, we're still up and running because what the server gets is still exactly the same inside of request.form. They can access. You can still access name or sport no matter what type it was in the user's own browser. Questions on these techniques? All right. Right. Well, it's kind of obnoxious that when you don't do something right in this website, like forget your name, but do select a sport, all you are told is generically you are not registered. Like, it'd be nice and much more userfriendly, better UX, user experience, so to speak, to actually tell the user what's wrong so they can actually fix the problem. Now, there's a bunch of ways we can do this, but I'm going to propose that we go ahead and do this. Let's create a template called error.html, whose purpose in life is just to tell the user a little something more about what they did wrong. So, I'm going to go back into my terminal window here. I'm going to code up a file called error.html. Enter. And I'm going to go ahead and before as before extend uh layout.html, learning from my past mistakes and closing that quote. Then I'm going to go ahead and do body block down here. And then inside of this block body, I'm going to go ahead and have just some simple text like an H1 tag that just says error to the user. then a paragraph tag that's going to contain some error message to be determined. Uh and then uh that's it for now. So I've got the template for an error message screen. Let me go back into app.py now and let me add some logic because app.py does know what's wrong. It's just at the moment we're very generically returning a failure template instead of something more precise. But if I know that the user hasn't given me their name, well let me say that error message. So, let's actually get rid of these two lines and be a little more specific like this. So, if or how about let's do it like this. How about validate the user's name first? So, name equals request.form.get quote unquote name. That just gives me a variable containing the user's name. If they didn't give me a name, which I can express with just if not name, like if name is blank or none, then let me go ahead and return render template of that error template. But let's pass in a specific message like missing name. And so by passing in another argument to this template called message, I can trust that Flask will dynamically output that message where I tell it to using the old curly braces. Meanwhile, let's go ahead and validate not just the name, but validate uh sport. I can do this in a couple of ways. Let's do this. So sport equals request.form.get quote unquote sport. Then in here, let's say if there's no sport, go ahead and return render template quote unquote error.html, message equals missing sport. So quite like name. But we can be more specific now, too. If the sport they did give me is not in the global sports list, well then it's Kelly trying to register for volleyball again. So let's return render template of error.html, HTML, but this time the message shall be invalid sport or something like that. So, we're being ever more clear otherwise they are presumably confirmed because we got this far logically. So, if I go back to the other browser tab, go back to the form and let's go ahead and type in no name and just click register. Okay, [laughter] what did I do wrong accidentally? So, let's go back to VS Code, open my terminal, open the first terminal window where Flask run is running. un encountered unknown tag body. So I did something stupid in error.html. So let's go into error.html and uh body block. Oh, that's subtle. I just transposed the words. It's supposed to be block body. That was dumb. All right. Block body. I think that's correct. So let's go back to the browser. Let's reload. It's prompting me to reconfirm that I want to submit the exact same form which recall had no name and no sport. But now I see an error in a good way. This is not an uh server error. This is my error. Missing name. Now it's not super user friendly, but it's at least more explanatory than you are not registered. All right, let's go back. Let's give it a name, but no sport. Register. Ah, missing sport. Let's go back. Uh, let's go ahead and give it a sport, but uh a sport, but no name. Missing name as before. And if I took the time to actually hack the HTML and do what Kelly did before and add volleyball, it would similarly say invalid sport in this case, too, because it's not in that same list. All right, questions on this technique. All right. Well, it's all fine and good to have a registration site that does this, but it's literally just throwing out the information. And what I did like years ago was actually even cut a corner initially where I think I wrote code that just sent an automatic email to the proctor running frost IM containing the person's name and the sport for which they registered. But that was very quickly replaced by a better feature which is actually store the data in the server itself and keep track of it rather than just send it off via email. So let's do a first pass at actually storing information on everyone who has registered for sports. Well, well, let me go up here and let me create another global variable to make my life easier here called registrance and set this equal to curly brace close curly brace. What do these two characters represent if empty especially? What data type is this? It's a dictionary. So, it's a Python dict. So, you could similarly say dict explicitly open close pen. But it's more Pythonic generally to just use two curly braces. This is just giving me an empty dictionary. Why? Well, I want to store the two things I'm se collecting about all of the students, their name and the sport for which they registered. So, key value, name sport. So, how can I go about doing this? Well, it's pretty trivial. Down here in my register function, recall that I'm just kind of naively saying you're registered even though I'm not doing anything with their name or sport. But that's easy. Let's remember the student for real now. So in that registrance uh uh dictionary, let's go ahead and index into it using the student's name, David or Kelly or whoever, and set that equal to the sport for which they registered. And now notice the name is coming as before from request.form.get. The sport is similarly coming from that function. And so this is just remembering that key value pair. So that's all fine and good. It's in the computer's memory. How do we actually see it? Well, wouldn't it be nice after you register if you could see the actual registrance of the website? Um, uh, certainly if you're the proctor trying to run the sports. Well, yes. So, let's go down here and let's create another route like /registrants, which is just going to give me a list of everyone who's registered. Let's define a function called registrants, though I could call it anything I want. And this one's going to be relatively simple. Let's render a template called registrants which will soon exist and pass in all of the registrants that are in that global dictionary. And again I can call this placeholder anything I want but in so far as it contains the registrance I'm setting registrance equal to the registrance global dictionary. So let's go now into my terminal window and create registrance.html HTML and create really the beginnings of an actual frostim's website that's going to show the proctor who has now registered. So let me go into this terminal and do code of registrance.html and close the terminal. Let's try to get this right. Finally extends layout.html close quote uh close bracket there. Then let's do block body in the right order. Then end block down here. And then inside of the block here, this is going to be a bit more of a mouthful, but let's use some of our HTML from last week. We'll give an H1 tag that says registrance so the proctor knows what they're looking at. Then let's put this in a table for instance with two columns, names and sports. So table tag followed by a T head tag for the table heading. Uh then that heading is going to contain just a single row for TR. And each of those has a th table heading. Uh one of which, and actually I'll make it tighter is name. The other of which is going to be sport. So these are the column headings, the table headings, TH tags for short. After the head of the table, let's go ahead and do a T body for table body. And inside of here, this is where Ginga comes in use. I can say for each name in the registrance placeholder that was plugged in and for proactively, what do I want to do on each iteration? Well, I think want to output table row, table row, table row. And in here I can do TR and then inside of that a table data for the cell on the left putting in the student's name which is coming from this for loop just like in Python. And then one more table data namely the registrance uh placeholder indexed into at that name which because it's a dictionary will give me the sport for that student's name. And then I think we're good to go. And in fact, just to hark back to something I said last week when we were imagining, actually this is in week five when we were talking about stacks and like your Gmail or Outlook inbox is essentially a stack with the newest emails on top. And I hypothesized at the time that it's just row after row after row after row when we started talking last week about HTML. Here is what Google and Microsoft and others are probably doing. Anytime you have tabular information in a page, they've got some data in memory like the registrants and they're just using code like this in Ginger to output table row, table row, table row. Imagine this is your email instead. Same exact idea. And now we have the ability to express that kind of logic. So let's go back now into the browser. Click reload on the form. Let's register for instance David for basketball. Click register. It claims I'm registered. But hopefully now I'm legitimately registered because that variable is storing it in memory. And in fact, let's go ahead and go now to not slregister, but I'll zoom in at the top registrance and hit enter. And we will see a very ugly but functional HTML table containing two columns name and sport. The so-called t head with which David and basketball are present. Moreover, if we now go back to that form and let's try registering Kelly for instance for soccer. Click register. Now let's manually go to registrants again. Now Kelly and David are in the server's memory as well. Questions then on what this example is now doing or how it's achieving these results? Yeah. >> Really good question. If you wanted to restrict the registrance page to only certain people, ideally you would have a password on it. Um, and in fact, one of the next examples we'll do in a few minutes is a a login page for exactly that reason. Right now, just sort of on the honor system that only the proctor in question goes to this URL. But just for the sake of discussion actually, suppose that you did want the registration list to be public if only to like hype up who has already registered. Well, it's not you good to just tell people go to the /registers URL. We can actually link them to that in a few different ways. So for instance, I can go down to uh how about uh let's say success.html. So let me open up success.html. It just says you are registered. I can do something like this. Um a href equals /registrance. So I have control now over my HTML and the routes. So slregistrance will exist. Uh see who else registered. Period. So, this will create a nice little HTML link that links me to that route. So, let's try this. So, let's go back to the form over here. Uh, let's go ahead and register John for ultimate frisbee and register. All right. And now we see you are registered. See who else registered. And if I hover over this, it's super small, but it would have showed me in the bottom left corner at the link. And indeed, here now is John at the bottom of this table. And just to be clear, if I view page source on the browser, you see all of the TRS that we dynamically generated on the server side before they were sent as such to the browser. All right. What if we wanted to do something slightly more elegant here? Well, I don't have to just use this HTML hack like why don't I just show the user who has registered automatically. And this is kind of a cool feature of web apps as well. In addition to importing flask render template and request, I'm going to also import a function called redirect that comes with flask. And indeed, rather than just show success.html, I'm going to go ahead and return the result of redirecting the user to /registrance. So to be clear, I'm in my register route, and instead of showing them the success page anymore, which I might as well delete at this point, just going to redirect them to this list of everyone who is registered, including themselves. So, if I go back over here and type in someone like Doug, who maybe will play basketball with me, and click register, watch what happens to the URL at the very top of the screen, I'm automatically whisked away to registrance in this case. Um, I made a change to the code though, and so the server actually was smart enough to reload. So, Doug is now uh the only one in the database. And this actually hints at a problem we should really solve. Like, in fact, let's do this real fast. Let me go ahead and register myself again for basketball. Register. Now, it's Doug and David. The catch though is if this server ever goes offline, maybe because it needs to be updated or it crashes or it reboots, when you hit control C and get back to your terminal, Flask server is no longer running, which means that global variable called C registrance in all caps is gone. It's like free. The memory has been freed. So, if I were to rerun Flask now, as would happen automatically if the server itself rebooted, well, this is not great because if I go back to the registrance page and click reload, no one has registered. And in fact, that's what happened with Doug a moment ago because I changed my actual app.py, Flask was smart enough to realize, oh wait, the code has changed. I better reload the program, which gave me a brand new version of that global dictionary. So what would be better clearly than storing registrants in memory in RAM in a variable in the server? Yeah. Yeah. So in an actual database and so here's two where everything kind of comes full circle and connects again. So let me go back into uh app.py here. And I like generally the logic of what I've done. I don't like the fact that I'm just storing my registrance inside of this global variable, which is again just in the computer's volatile memory. Let's actually put this in a database instead. So, let me go up here and get rid of this global dictionary and let me do something a little smarter up here. Let me import from CS50's own library the SQL function that we've used before. And again, even though we've been taking off all almost all of CS50's training wheels, the reality is using CS50's SQL library, even through final projects, just makes using SQL in Python so much easier. But there's certainly thirdparty libraries you can use. Um, let me go down now and in addition to creating my app, let's create a database, DB for short, setting that equal to SQLite, and then SQLite SL, which is not a typo. And let's assume that the database shall be called frost imdb. More on that in a moment. And then down here, now that I have a database variable, let's not remember the student by storing them in this dictionary. Let's actually execute a line of SQL. So, db.execute insert into Well, wait a minute. What am I going to insert them into? Not to worry. I came prepared for this. So, let me go ahead and maximize my terminal window and then run SQLite 3 of a file called frost imdb. And this is a file I made in advance, but it's super simple. In fact, if I type dot schema just to see the design of this database, you'll see that in advance I created a table in this database called registrance. It has a column called ID, a column called name, and a column called sport. And the primary key of this table is to use the ID value which is just an integer. And now notice I have some constraints here. I want the user to give me a name and a sport. So I've specified that it's not just text, it's not null. That is null values should not be possible to put in here. All right. So, let me go ahead and exit out of SQLite 3. Let me go back into uh my code editor here. And now I know what to insert into. Insert into the table called registrance. What? Well, I want to insert how about a name of the student and the sport for which they registered. And the values therefore that I want to insert are going to be whatever they came from the post request. Here's where you do not want to make yourself vulnerable to SQL injection attacks. No fst strings in here. you know, just plugging the students input in blindly. This is where and why we use these placeholders in both CS50's library and in many libraries uh in the real world to specify that I want the library to properly sanitize the user's input and get rid of any scary characters like apostrophes or semicolons or the like. So, I'm going to pass in name and sport. And this one line has the effect of, as you recommended, storing the registration in an actual database on the server, not just in volatile temporary memory. But we do have to change one thing. This line here is no longer valid because there's no global variable there via which we can get all of the registrants. But that's no big deal. Here's how most web apps would do this. I'm going to define a variable called registrance and set it equal to DB execute of select star from registrance. It's as easy as that to just get all of the registrants from my database. And down here, there's no longer an all capitalized variable, but there is a lowercase one registrance. So, to be clear, in my register route, I am inserting the user into the database. And in my registrance route, I am selecting the users from the database. And then the rest of the code, I think, can stay the same. So, let's go back to fro's here. Go back to the form. Let's register David for basketball register. Ah, I did screw up. You're seeing some weirdness here. What are you actually seeing? There's one user registered. Not intentional. But what does this syntax suggest? We're looking at this is a dictionary. Recall that the db.execute method that comes with CS50 SQL library gives you a list of dictionary objects. And so because there's only one registrant at the moment, you're seeing my dictionary for my registration, which is not what I want to show here. And I forgot. I need to also go back into the registrance uh template to tweak my syntax as follows. Let me go back into VS Code here. Let me go into registrance.html. And because I am passing in now not a dictionary but a list of dictionaries, I just need to think about the problem a little bit differently. So my syntax here is going to be for each uh let's do this as follows. For each registrant in that registrance list of dictionaries, go ahead and display the current registrance name and go ahead and display the current registrance sport. In other words, I'm using Python syntax which works as well in Ginga here. This iterates over the list of registrants each of which is a dictionary. So I'm using dictionary syntax now to index into the name key of the registrant dict uh object and the sport key of the same. So now let me go back to my browser and I'm just going to go ahead and reload the registrance page without resubmitting the form. Now there it is. David and basketball. And now let's go back to the form and register a couple more people. Kelly for soccer register. Notice we're at the registrance link. Kelly is indeed still registered. Let me go back to this and let's register John. Ultimate Frisbee register. Let's go ahead and kill the Flask server by going to my first terminal window. Uh, control C. And now let me go ahead and rerun Flask, which was bad before. That's how Doug ended up the only registrant last time. But this time if I go back to the registrance page and immediately click reload, even though the server is running a new in memory, the database is persistent, which was the whole point of using SQL from week uh seven onward. And let's do one more for good measure. If I go back to the form, we'll register Doug so he can play basketball with me, too. And we even have Doug now in the database. It's an ugly looking table, but the data is in fact all there. All right, questions now on this improvement which is getting closer and closer to what the actual Frostim's database did uh website did so many years ago. All right. Well, let me propose this now. We have this table of registrants. Suppose that um maybe uh Kelly was not a very sportsman like when she played soccer last time. So, we want to dregister Kelly from soccer. That is nope. we're going to reject your registration. Let's think for a moment about the design here. Like, here's an HTML table containing names and sports. And wouldn't it be nice if we could add a button that would let me dregister Kelly or anyone for that matter? When I click on that button, what information should ideally be sent from the browser to the server to remove someone like Kelly from the database? >> ID. >> Yeah. The ID of the person. And you're proposing ID instead of name. Why? the ID uniquely identifies in that SQL table. >> Exactly. The ID uniquely identifies the user in the SQL table. So, in fact, let's see this real quick. If I go back to VS Code and we'll revisit essentially a week seven issue here. Let me go back into my second terminal where I can again run SQLite 3 after maximizing my terminal. And before I just wrote schema to see what the table is. Now I'm going to literally run select star from registrance in SQLite 3 and we'll see a little askar table of all four of us who registered but we also see the unique ID and the value of the unique ID recall from week seven is that it's the so-called primary key. It is the value that uniquely identifies users as minimally as possible and that's a good thing because if we have another Kelly registering for frost IM's we don't want to dregister the wrong Kelly or both Kelly's we want only the Kelly with ID of two. So somehow the button we add to the registrance page should contain in it the ID of the person we want to delete. Because if you do pass the ID of the person that you want to delete to the server, the server can do some kind of select looking or some kind of delete statement using that ID number and delete just that row. So there's a few ways we can do this, but let me propose that we proceed as follows. in our registrance route, which is where we can currently see all of these users. Let's go ahead and output an ugly but functional form for each of those users. So, let me go ahead and uh minimize this and hide my terminal window. And in registrance, let's go ahead and just do this. In addition to outputting every registrance name and sport, let's also output a third column whose purpose in life is to contain an HTML form. The action of that form will be a route like dregister and the method we're going to use is going to be post just so that we don't accidentally store uh personally identifying information in a URL or such. This form is going to have a button the type of which is submit and the button is going to say dregister. And I could now implement the ID in a couple of ways. I could do input name equals ID, type equals text. And now if I go back to my other browser tab and reload, I should see a button for every one of these registrants. And I do. But this is kind of like the honor system where I just let the user type in the ID of who they want to delete. And it's sort of weird that I have multiple forms in that case. But here is where dynamically generating HTML can get pretty uh useful. Let's change the type of this input to hidden and set the value of this uh input to be whatever the current registrance ID actually is. Uh storing this in here and let's go ahead and not confuse this. So we'll use single quotes on the outside instead. So inside of this value I'm putting the current user's ID. So, if I go back now, notice that the text boxes are going to disappear, but the buttons will not. But all of that information is still there. If I right click or control-click and open up my developer, uh, let's open up view page source because it's just a bit bigger. Notice that David and Kelly and John and everyone else here has the same HTML as before, plus another column containing a form that contains a I somehow messed up still. Why is this blank? So, this is still not good. Ah, thank you. I accidentally pluralized this, but it should be registrant because I'm inside of this for loop and each iteration gives me a variable called registrance. So, user error on my part. So, let's go ahead and dramatically do this again. Let me view page source of the same page. Scroll down a bit. Thankfully, there is now for every one of these registrants a hidden ID for one for me, two for Kelly, and I bet if we keep scrolling, we'll see three for John, and four for Doug. So, now this form has enough information, even though there's no user input other than the clicking of the button to tell the server whom to delete. So, how do we delete the user from that particular registration table? Well, I think we just need to add a route. So, let me go back into VS Code here into app.py and let's go ahead and create another route for instance uh in here say uh we'll put it up here below uh up here below index. So, app.root quote unquote slash dregister whoops dregister and now defregister but I could call it anything I want. And how do I do this? Well, let's first get the ID from the form. ID equals requestform.get get quote unquote ID. Let's do a bit of a sanity check here. So if there is an ID and it's not blank for some reason, go ahead and do DB.execute delete from registrance where ID equals uh question mark. And now let's pass in the user's actual ID. And then no matter what, let's go ahead and redirect the user back to the registrance page so that we can hopefully see the result of that change. So again, I'm just using a bit of SQL per week 7. I'm using a placeholder by using the question mark, passing in the actual ID from the form. And I'm only doing this if there is an ID that was passed in. And I'm letting the database actually do the deletion. All right, so let's try to do this. Let's go back to the browser here. Reload the /registance page for good measure. Let's decree that Kelly is now dregistered by clicking this button. And oh, so close. method not allowed at the dregister route. What did I do wrong? Let me go back to the code. What's wrong with my dregister route? Well, what method is the form using? If I go back to registrance.html, the meth the form is using post. >> Yeah. So, I need to override the default, which is get. So, I need to go up here again and just change an argument to be methods equals and then in a list containing only post now instead of get. All right, let's go back to the form and go back. And now let's try to dregister Kelly. She's gone. Let's get rid of me now. I'm gone. And indeed, if I go back to VS Code, open my terminal, maximize it, and select star from registrance again, you'll see that the two of us are indeed gone in this case. questions now on this technique because now we have most of the plumbing in place for adding people to a database, deleting people from a database. It's very similar in spirit now to most any website that has this kind of interactivity. All right, subtle question. I deliberately in my registrance.html file uh used post as we just discovered instead of get. Why though? because it wasn't that strong an argument that I hinted at earlier of like, well, I don't want like Kelly's ID to end up in my URL bar or mine. Like IDs are not really personally identifiable. They're just opaque integers at the moment. But why would it be bad if you could delete people by using the get method? So this is kind of subtle but the catch with using get is that by definition you can visit that resource that route by just typing in a URL or following a hyperlink. So for instance if an adversary were to type a URL like /registrance question mark id equals oh I don't know uh four and then send me this URL in an email or send this URL in an email to the proctor who's running the frostam's program. If that proctor simply clicks naively on this link as my code is implemented now and I've used get instead of post, what's going to happen? >> Doug gets dregistered just because the proctor followed a link in their email. And this is hinting at the kinds of fishing attacks that are possible too. Bad design like generally when you are using get requests that is just simple URLs that are clickable or typable. They should not have the effect of changing data on the server. Post is much better if only because you can't just click a link and post happens. To induce a post request, you almost always have to click a button. So, at least this case, the proctor would receive an email. They would have to receive an email, click on a link, and then they would see a web page like this that clearly has a button labeled dregister or the like, which is an additional layer of protection. And there's even more attacks that you can wage by supporting get. So in general, post requests are preferred anytime there's anything remotely personally identifiable or remotely destructive like actually changing data on the database like this. All right. Well, what more can or should we do with fro perhaps? Well, let's see. Maybe one or so final flourishes here. Um, if I want to go ahead and maybe make those error messages a little more interesting. Let's do that for just a second. Let me go back to uh my uh other browser tab here. Let's go back to the registration page where the form is and let's deliberately not cooperate and just click register so that I get an error about missing name. Well, wouldn't it be nice if we made this a little more user friendly by including like an image on the page as is commonly the case? Well, we can certainly include images in websites using the image tag, but the catch is we actually have to be a little more clever about how we store the image on the server in order for this to work. So for instance, let me go into that error page. We don't need success open anymore and we don't need layout anymore or this index anymore. Let's focus on error. And suppose that I did want to include an an error message containing like a a grumpy cat on the screen. Well, ideally I would just do alt or I would do open bracket image uh source equals and then something like cat.jpeg where cat.jpeg is the name of a cat in this current folder. And just to be clear, let's have an alternative text of grumpy cat for screen readers or slow connections. Okay, this unfortunately is not going to work. Let's go over here and induce the same error by just reloading and submitting the same form. And you'll see indeed a broken image because that image that cat.jpeg does not exist, but we do at least see the alternative text. Well, I did come prepared with a cat already. And so, let me go ahead and grab this cat from another folder. And this cat is going to contain uh is going to exist in a file called cat.jpeg. And indeed, if I type ls now after having grabbed a copy of that cat, it exists alongside app.py. Seems good. Let's go back to the browser here. Let's reload. And we should see ah still no cat. Well, why is this? Well, this is a side effect of using the framework as well. It turns out for organizational sake, any images you want to display on a page or any CSS files or JavaScript files that you want to embed in a page, if they're static assets, should actually be in a folder called static. And by static, that just means unchanging. You or someone else wrote them once and they're not dynamic in the way that app.py is. So, I'm actually going to use my mv command and move cat.jpeg into the static folder. Indeed, if I type ls now, cat is gone, but it is in the static folder. And now if I go back over here, I think we'll be good except that I do need to go into error.html and say that the source of this image is actually in /static/cat.jpeg to make clear it's in that folder. And so indeed when I now reload the page once more now I see a very grumpy cat at least guiding my error message. A but there is a difference here. Even though when accessing the static directory I have to be explicit. Notice that this whole time we have never once mentioned the templates directory. The render template function to be clear knows automatically to look in the templates folder for your template. You do not and you should not say something like templates here. You simply specify the name of the file. But in the in the uh HTML template, you do actually have to include as I did /static in the HTML. All right, let's do one final flourish with the actual code. Suppose that it's time to modernize and let people register not just for one sport as per the radio buttons, but multiple sports. It's a little obnoxious to make me go back and fill out my name again and again and again if I want to register once, twice, three times for sports. So, why don't we uh go ahead and in terms of UI change those radio buttons to checkboxes? That's a very easy fix. Let me go into uh my templates folder and into index.html HTML where this form is. And if I want to change radio buttons to checkboxes, literally just change radio to checkbox. If I go back to the browser here and reload, you'll see the familiar checkboxes now, which are not mutually exclusive. It lets me check multiple ones, thereby registering for multiple sports at once. But my logic has to change a tiny little bit here whereby if I want to go ahead and get all of the sports for which the user is registered, well, that logic has to change in app.py. So where is my register route? Down here. And we haven't touched this in a while, but recall that the register route here has uh a validate name chunk of code, validate sport chunk of code, and we most recently did the insert into chunk of code as well. But if the user is registering for multiple sports, I'm okay with having one row per sport, even though I'm sure we could do better than that. But how do I iterate over all of the sports that the user gave me? Well, I need to change my validation code here a little bit. If you know the user can select multiple values as with checkboxes, you're going to use request.form.getlist and then the name of the uh parameter that you want to get the value of. And then this is going to give me back a list of values. So I'm going to go ahead and change semantically my code to say sports because I'm expecting zero or more sports now instead of one. So if there are no sports, we're going to just say missing sport. Heck, missing sports. Um but then I can't simply do this. I can't just say is the sport for which the user registered in that array or not because they might have given me two sports or three. So logically I should really check all of the sports that the human typed in for me and I should probably do something like this instead. So for each uh sport in the sports that the user typed in, go ahead and uh ask the question if that sport is not in sports, then go ahead and output invalid sport. So it's just a bit of tedium here. We're just adding a bit of logic, but this way I'm iterating over every check box that the user checked and making sure they didn't do what Kelly did earlier and sort of make up her own sport and submit that to me among all of the others. But this now should let me. Let's try. Let's reload. Oh, and then actually one other line here. We also need to do it down here. Uh, for each sport in sports, we better execute that line of code multiple times. So, let's see what happens. Let's go ahead and register David for actually let's see what who's in the database still. So registrance. So we've got John and Doug. No David or Kelly. So let's reregister David for basketball and soccer. Click register. And now I'm indeed registered for both. And I observe that it's kind of bad design that I'm just inserting myself twice into the database. So let me go ahead and open up the Frostims database one last time. Uh let me do a select uh let me do a select star from registrance. You'll see too that David and David are both there. What would be a better design here to get rid of the redundancy and to know that I'm the same person ideally? Yeah. >> Yeah. I should probably have an ID for the the person as well. So this is going to complicate it more than we want to play with today. Instead of just a registrance table, I should probably have like a students table that has an ID for every student and the name of every student and then change this table as we've seen with the IMDb database and others. I should really be storing the IDs of the students, the Harvard IDs if you will, and not just their names like this. So, there's room for improvement, but the point here is just how we can actually use checkboxes and get back multiple items from folks. All right, that was a lot. Questions on where we're now at. All right, to make the coding a little less tedious, what we're going to do is look at a few final examples that have sort of come pre-made, and we'll walk through the code, pointing out only what's different as opposed to some of the boilerplate that we keep seeing. Um, where we left off now, recall, is that we have app.py, which is all of our logic, requirements.ext, text which just enumerates the libraries that we want to use in the project. Static which now contains any static files like cats or JavaScript or CSS and templates which contains our actual templates. It's worth noting that we're actually following a fairly common paradigm. This is not specific to Flask. The model that we've essentially the the paradigm that we've essentially been implementing is this. If this uh shape over here represents the human or the user, they keep interacting with what the world generally calls a view. A view is the term of art that just describes like the user interface. aka view. But that view is generated by a certain type of code, namely controller logic. So app.py is technically what the world would call controller logic or business logic uh to use an industry term. And that controller code, aka app.py, is generating one or more views. So the views that we're referring to here is like everything in your templates. Those are your views. But there's a third piece of the puzzle that we just introduced which is generally called a model. And initially my model was just a stupidly simple uh dictionary in memory and that evolved eventually into frostams.db. So your model is generally your persistent data like where you're storing data related to the application. And even though the picture doesn't lend itself to pronouncing it in the right order this is what's known as the MVC paradigm model view controller. And it's a very common way of developing web apps by just thinking about the different problems you need to solve with this kind of nomenclature. Like I've got to implement my controller which does all of the logic, all of the variables, functions, conditionals, loops, and so forth. I've got to implement the view which contains everything the user sees and interacts with like the HTML. And I've got to eventually implement the model which is like all of the backend data space and such. The catch though is that this is not a clean line because clearly in views we've seen variables, we've seen loops, we've seen conditionals. So this is just a general mindset to have and in the real world if you ever uh explore web apps again you are henceforth familiar with what's known as this MVC model. But now let's solve some other real world problem. So here's what you see on the occasion that you sign into something like Gmail or really any other website that asks for a username and then eventually a password or some such thing. This is just a web form. It looks a lot prettier than mine because they're using some fancy CSS to make things blue and nicely indented and so forth, but it's just HTML underneath the hood with probably an input type equals text to give me this text box. Of course, when you log into Gmail after providing your password, somehow Gmail remembers often for days, weeks even that you have logged in already. Now, how is that actually working? Well, when you first log into a site like Gmail and click submit or the next button in this case, presumably the browser is submitting in a virtual envelope, so to speak, a message like this to Google's servers. Post slash something to accounts.google.com, which happens to be the URL that Google uh typically uses for this. And inside of this, the dot dot dot is your username and password and anything else that might be submitted to the server. Ideally, the server responds to you with 200. Okay, like here is your inbox. Okay, you logged in successfully, but it also underneath the hood, every time you've been logging into Gmail, has been planting a cookie on your computer. And you might be generally familiar with cookies. They have kind of a bad rap because they're often used and are used quite frequently for tracking, for advertising, um, and really kind of keeping eyes on you in some way. But in their basic form, they're just a feature of HTTP, which is wonderfully useful because it solves some typical problems. Uh this is another HTTP header that is usually inside of those virtual envelopes that come back from servers to browsers. In addition to telling the browser what the type of content is in the envelope, it might tell the browser, please set the following cookie. A cookie is just a key value pair. It might be something like session literally equals some value. And that value is usually a random string that might be 1 2 3 4 5 6 or something like that, but it's a unique identifier. Or naively, if Google implemented cookies poorly, they could technically tell your browser to store a cookie on your computer containing your username and a password. Why? So that tomorrow when you open up Gmail, you're not prompted again with the stupid form to log in. It already knows your browser that you're logged in. And your browser can do that by just sending the same cookie it got yesterday to the server. Now, this is bad to use cookies to store usernames and passwords generally because it's putting very precious data in the browser's memory and any sibling or roommate who walks over to your browser can now find your username and password by just poking around your cookies. So generally what browsers do is more like this screenshot here whereby all the server does is it puts a big random value on your computer somewhere essentially a text file containing a big random value and that is equivalent essentially to sort of a handstamp like if you go into a bar or a club or an amusement park generally you show your ticket once when you go in and then thereafter you just show your hand if you want to be able to come and go again and again. So right now my hand has not yet been stamped. We uh have this nice here smiley face sticker. I might have a smiley face now on my hand anytime I want to go back into the bar or club or amusement park because they now know, oh, we already checked who you are, presumably the very first time that you came in. That's all cookies are effectively doing is it's putting a virtual handstamp in your browser because the browser the next time you go to Gmail and click on a link or click on an email. Your browser unbeknownst to you will send a get request that looks like this but also contains a line like cookie colon and then that same key value pair. It's like presenting your handstamp again and again every time you open an email or click on a link in Gmail. This cookie header is what the browser sends. This set cookie header is what the server sends. So this is the act of stamping your hand. This is the act of presenting your hand. And that effectively is how browsers and servers remember who you are. This is how advertisers generally remember who you are because at one point or other they put a cookie on your computer and unbeknownst to you, you're going to this website, this website, this website and your browser has been presenting this handstamp all this time so advertisers know, oh that's David again, that's David again. And that's David again because they're seeing the h same handstamp. And so one of the reasons why last week for instance I kept opening things in incognito mode which you might use generally if you want to do something private and not have it be saved in the computer's memory is also because incognito mode gets rid of all of your cookies when you close the window effectively like wiping off the handstamp the next time you go to that same website. So that's all a cookie is. It's a key value pair that can be planted on your computer, but it's a wonderfully powerful mechanism for implementing, and this is the juiciest idea for today, I'd argue, what are called sessions. Sessions are this feature whereby browsers and servers have a persistent connection to each other, even though HTTP is what we'll call stateless. So stateless just means that you don't have a constant connection to the server when you are using a website. And that's not always true. And nowadays you sometimes do have a consistent a persistent connection but cookies allow you to close your laptop even shut down your computer come back the next day and still have the illusion of being connected just as you were the previous day because of this virtual presentation of handstamps. So a session more concretely you can think of in Python as a dictionary of key value pairs that you can associate with each and every user. That is to say, when I log into a website that is using sessions implemented with cookies, they can store any number of key value pairs about me in the server's memory. And my presentation of the handstamp will ensure that they keep uh they know which key value pairs to assign to mate. Let me go back into VS Code here and let me CD into a directory with which I came, which is called login, which is just going to be a relatively simple Flask application that demonstrates how you can implement the ability to log into a website. And we'll keep it super simple with just usernames, no passwords. But as you'll see in problem set 9, we'll add some passwords to the mix as well. If I type ls inside of this login directory, you'll see some familiar friends, app.py, requirements.ext, and templates. But let me draw our attention to one other library we're going to now start using called Flask session. So flask session is just a third party library that gives us the ability to use cookies in our application and not have to know or understand any of the screenshots we just saw of HTTP requests. it sort of suffices to stipulate, okay, someone figured out how cookies works. I just want to use them now as a feature so that when a user uses my website, I can associate data with them like who they are, what their username is, and therefore that they've logged in. So, let's go ahead and close requirements.ext and open up app.py in this case. Here is an implementation of a program whose purpose in life is to enable me to log in. And in fact, before we demon before we walk through the code, let me do this in this uh terminal, let's do flask run. And I already hit control C on my other terminal window a moment ago. Uh let me now go into my other tab up here and reload the slash route, which is now going to be this login route instead of frost imams. All this website does by default is it tells me first you are not logged in, but here's a link to log in. It's a little small, but if you look in the bottom lefthand corner of my browser right now, it's a URL that ends with slashlo. And in fact, I can see that more clearly if I view page source in the browser. Here is the only thing I'm really seeing in this web app so far. But notice what happens now. If I click on login, the route in my URL just changed to /lo. I'm again keeping it simple with just usernames, no passwords, but I'm going to log in as David and click login. But first, let me show you the code. In view page source, I have a form that submits to /lo using the post method. The only thing about this button that's that form that's interesting is it's got a text box and a login button. Same as we've seen before. So, let's click it. Now, I click login. And notice I get whisked away back to the original route, the slash route. Even though Chrome is hiding the slash from me, but the website somehow knows that I'm logged in as David. In fact, if I open up my page source in the browser, I'll see that now it doesn't say you are not logged in. It says I am logged in as David. And it's now giving me apparently conditionally a logout link. So I argue this is representative now of any website that lets you log in and out of it. So how does this work? Well, in my login account uh in my login app here, what do we have in app.py? The following. I've got from flask import flask redirect render template request and a new one session which you can essentially think of as a dictionary where you can store key value pairs for each and every user and flask will make sure that your code has a different copy of session for every user that visits. You can just treat it as though you only have one user, but Flask will ensure that when a user visits, they get their own copy of session, their own copy of session, their own copy of session essentially to store whatever you want. This next line here, I just need to copy paste from flask session import capital session. This line is the same. Turn this file into a flask app. This stuff is new and find a copy paste. This just says configure this app to use sessions by storing the cookies on the server as files instead of in a database or somewhere else. But this is the default that we use for our examples. All right, what's going on here? Well, in my slash route, I've got an index function whose purpose in life seems to be to render a template called index.html and then pass in a name placeholder, which is the value of session.get.name. So whatever name is stored in the session if any that gets passed into the template. So let's go down this rabbit hole. Let me open up index.html. Interesting. So here is the logic that implemented those two different versions of the homepage that we saw. If the name has a value, so if it's not empty, we saw you are logged in as such and such. Here's a logout link. If though there was no name, as happens by default before you even log in, you see you are not logged in. Here's a link to log in. So that's all the homepage is is it's conditional logic checking if there is in fact a user logged in. All right. Well, let's go back to app.pay. How does the login work? Well, if you find your way to the login route, then I'm asking a question. If the user got here via post, they probably got here by clicking the login button that I gave them. So, let's store in the session dictionary the word name and make the value of that key this value here where what I've just highlighted is whatever the user typed into the form whether it's David, Kelly, John or anyone else. That's what comes back from the form and I'm just storing that in the session which again is like this special global variable that you get one per user and it's implemented underneath the hood by way of cookies or these handstamps. Then I'm just redirected to the slash route. Otherwise, if the request method wasn't post, that means the user just van newly visited example.com or whatever my website is. That's why I show them login.html. All right, let's go down that rabbit hole. Let's open up login.html. It's pretty simple. It's just a stupid form that has a text box and a submit button. But the most important part is that as we saw in the browser, it submits to /lo the route we just saw. All right, if I go back to here, how do you log out? Well, we didn't actually click this, but here is how you can delete the contents of the session and actually log the user out. You just call session.clear. And so, in fact, if I go back over here and click log out, how does the server know that I've logged out? Well, that route very quickly, you didn't even see the URL bar change logged me out by clearing the whole session. And so, the cookie that was planted on my computer was essentially deleted at this point in time. Or really, the server side data that's associated with that cookie was deleted. So, I'm no longer seeing it at all. So, that's kind of it. Like, if you log into a website, whether it's Facebook or Gmail or Outlook or anything else, like that's effectively how they're logging you in, but of course, they're adding into the mix some uh passwords and other security as well. All right, how about one other example? Let me go back into VS Code here and let me go into my first terminal, hit C to kill this login example. Let me hit cd to go back and then cd uh store to implement the simplest of web stores like some kind of e-commerce site that has an actual shopping cart implemented. Let me do flask run inside of this directory. Open up my other terminal window. And in my other terminal window, I'm going to go cd to go back and then go into store here where I'm going to see some familiar files, namely app.py requirements.ext, but a database file this time in addition to my templates. Well, let's see what's inside of that database. Let me go ahead and run SQLite 3 of store.db dots schema to see what's in the database. Ah, this is like a bookstore like the very first version of amazon.com if you will. And the table has uh two columns an ID column and a title column for all of the books that this store shall sell. Well, what are those books? Select star from books semicolon. Okay, so this is a bookstore that sells only five books among them the Hitchhiker's Guide to the Galaxy and sequels. All right. So, wouldn't it be nice if we have a website that displays everything in this catalog and lets me like add things to my cart? And in fact, here is maybe the better metaphor for what a session is. A session essentially gives you the ability to implement a shopping cart like this where the shopping cart of course in the real world is specific to each user. Like if I'm on Amazon.com and Kelly's on Amazon.com and both logged in, we obviously don't see the contents of each other's carts. And that's because we have separate cookies on our hands. And so Flask or whatever Amazon is using creates the illusion that we each have our own global dictionary called session in which Amazon can store any key value pairs it wants like what's in our shopping cart. So let's try this. Let me go back to my other browser and reload. So I'll now see not the login example but the bookstore example. And it's super ugly because I whipped it up using the simplest of HTML. But you'll see here every one of the books in the database plus an add to cart button. And even if again you're sort of new to all this web programming, there's not all that much you can do with HTML except use forms maybe with some hidden elements to achieve this result. So here we have the H1 tag with books. Here's an H2 which is big and bold but not quite as big. Here's the form. Here's the uh here's the button for the Hitcher's Guide to the Galaxy as an aside because there's like a curly quote or an apostrophe in the book's name. This is just an HTML entity that Flask is outputting for me, even though it's not there uh visually in the database. So, what is the button do for Hitchhiker's Guide to the Galaxy? Well, it's a form whose action is /cart, presumably because I want to add it to my cart using the post method. I've got an input name equals ID, the type of which is hidden, the value of which is one. And fast forward 2 3 4. So just like the dregister example for Kelly, similarly, is each book going to be addable to a cart instead of removable by using that unique ID? And indeed, every form has an add to cart button. So what's happening then on the server? Well, let's take a look at the other tab here. If I go back into uh VS Code and if I go into my let's say let's minimize the terminal window here and let's open up inside of store. Let's open up our template for index.html which is sort of the entry point. Oh, which is not that. Uh let's open up app.py first and figure out what's going on. So at the top we have some imports including our SQL library. We have an app variable being created, a DB variable being created using that same store.db. We've got this boilerplate code which just again enables cookies and stores the contents on the local file system instead of in a database. Ah here's the interesting beginning point. How did I see that big page with all the books and the buttons? Well, for the slash route, we've got this function that first uses some SQL to get all of the books from the database. Select star from books. And then, ah, there's no index.html because I called it books.html in this case just because. And I set the books placeholder equal to the value of the books variable. All right, let's go down this rabbit hole now. Let's open up the templates folders books.html file. Okay, so here we have that H1 with books and then we have a for loop which is going to output for every book an H2 tag and a form tag a form tag again and again and again each of which has a value that equals the current book's ID but the title in the H2 of course is the title of the book which is more human friendly. So what happens when I actually click on add to cart for the Hitchhiker's Guide to the Galaxy? Well, I should indeed see that now that one book has been added. And if I go back and add another like the restaurant at the end of the universe, I now have two books in my cart. So, where is that data actually being stored? Well, if we go back to VS Code here, uh, hide the terminal and focus on the cart route. The cart route because it supports post in addition to get also is doing this for me. Well, first it's checking with some logic here. If there is no cart in the session, go ahead and create a key called cart and set it equal to an empty list. In other words, I can put any key value pairs into the session that I want. So, if I want my shopping cart to effectively be a list of all of the books that the user has added to their cart, it stands to reason that my cart by default should just be an empty list when they first arrive. However, if the user has clicked submit in order to get here, well, I'm going to do this. I'm going to get the ID of the book that they've submitted via that form. And if it indeed exists and it's not someone like Kelly messing around and sending me invalid parameters, I am going to append to the cart list in the session the book ID. And then I'm just going to redirect the user to the cart. And anytime you do a redirect that always is using get, not post. And so when I come back to this cart route later, I'm not going to be using post. I'm going to be using get, which means this chunk of code here is executed. I have a variable called books. set it equal to the results of doing select star from books where id in the following parenthesized list of ids recall that in is the preposition that gives me back multiple ids if I so choose and then I'm rendering cart.html HTML with those there books. And if I go back to the application, the reason why I'm seeing two elements here, and indeed if I go to my developer tools or view page source rather, I'll see two list items inside of an ordered list or a numbered list containing the contents then of that shopping cart. All right. So, if we now have the ability to use sessions to remember who has logged in and we have the ability with sessions to remember what someone has added to their shopping cart, what else can we do with web applications more generally, even if not using sessions? Well, let me go ahead and close this tab here. Let me go back to VS Code here. Close out these two examples and let's do a final set of examples that demonstrate what we can do with some real world data and a web application. I have lastly a directory called shows which is evocative of our use of IMDb in the past. And I'm going to go ahead into my first terminal window. Hit control C and call your attention to one thing before we move on. Every time I have executed a SQL query inside of my code in my first terminal window where Flask is running, you'll see either in green for success or yellow or red for some issues the actual SQL code uh SQL commands that are being sent to your database. This is useful if you mess something up at some point related to a database query. You can actually see in your terminal where you're running flask run actually what SQL command was sent to the server to to try to troubleshoot errors that way. Otherwise, you're just flying blind when actually interacting only with the web browser. But for now, let me go ahead and clear that away and cd back to my default directory and cd now into shows where if I type ls, we'll see a whole bunch of files. app.py requirements.ext text and this time shows.db which is the very same database that we had in past weeks when we played with some of the very large number of shows in the internet movie database. And what does zap.py do here? Well, it implements the simplest of programs. This gives me access first to shows.db with some boilerplate up top. If I scroll down here, you'll see that there's a uh index.html template that's rendered by default. And then apparently there's a search route which is akin to what Google does for us when we searched for cats and dogs in the past. But for the first time I'm implementing my own search engine for TV shows, not for dogs and cats. But what does this search route do? Well, it uses a shows variable and it executes the SQL select star from shows where title equals question mark and it passes in just like Google does the Q parameter for query and then it renders a template called search.html HTML passing in those shows as a placeholder. In other words, what does this do? Well, let me go back over to the store uh to the store tab here. Change the URL to just slash. And because I'm now running uh I'm no longer running the store, I do want to go ahead and run in my first terminal window flask run to start start off the shows application instead. So if I now go back to that tab because no server is running, what I see here now is the simplest of search boxes like our Google example asking for a query, but this time I can search for things with which I'm more familiar, like the office, capital T, capital O, search. And what I get back, not that enlighteningly, but is the title of every show that matches exactly that. If I go ahead and view page source, you'll see that what was generated was a unordered list of offices that are in the database. And recall there's the British one, the American one, and a bunch of others as well. However, this form does not work. If I type in something like the office search, I get no results in that case, which isn't so much a bug. Well, is just a lack of features here. And so, let me actually go into VS Code here, and let me propose that we come up with a better version of this code. So, in fact, I'm going to go into the pre-made examples with which I came today. I'm going to go into the next version of shows here. Run flask run here. reload the application over here and now show you that the office in lowercase does actually work. Moreover, it searches for anything that mentions the office. So if you had to guess how might this be implemented underneath the hood, well, if I open up my other terminal window and go into that same directory, shows one and open up this version of app.py, PI you'll see that instead of using a simple query like before I'm now using the like keyword here because I'm checking that it is like the office and notice this is a bit clever here or a bit confusing at first glance the placeholder I want is question mark but I don't want to just search for the user's input I want to tolerate zero or more characters to the left via the SQL wild card and zero or more characters to the right so I'm concatenating onto the user's input a percent sign here a percent sign here because recall from our week seven with SQL. This just means look for anything case insensitively that has t space o ffic in it no matter where that string is in the text. How did it know to render that though as this bulleted list of all of these offices? Well, let me go into my terminal here and open up uh search.html which is the template that the search route is using. And you'll see that I'm just iterating over with a ginger for loop each of those shows. and then outputting a list item for each of those matches effectively just as I did before. But there's this other technique I can use altogether and it's generally going to open up more possibilities for us in final projects if not beyond of creating essentially my own API. Rather than to just make a web app that spits out the entire HTML page that I want the user to see, wouldn't it be nice if I could just start to create routes that spit out the data that I want and then I or even some third party making a website with the same data can integrate my application into their own. And indeed, an API is an application programming interface. And it's essentially web- based functions you can call to get data from someone else's services generally using HTTP. And you can return the data in any number of formats in text format um in HTML format or in something called JSON format which is short for JavaScript object notation which looks a little something like this which is quite like Python arrays and dictionaries combined. But notice here with a wave of the hand, there's a whole bunch of key value pairs in this particular example of all of the offices that are in IMDb's database. And so I wanted to show us these final versions of this same shows application that works a little bit differently. If I go into say shows 2 example here now run whoops and let's go ahead and exit out of the previous flask copy and run shows two inside of which is flask run. Notice here that if I go back to this web form now, notice that there is no more search button because this is meant to be highly interactive and I can search for t space of ffic. And you'll notice that this is effectively autocomplete which we saw a taste of last week with JavaScript which I am in fact using here. But how is this working? Well, let me reload and open up my developer tools. And in developer tools, let's watch the network tab this time because when I type in something like t, you'll see that my web page suddenly made a request to my own slasharch route. And if I click on my developer tools and look at the response that came back, you'll see that the slasharch route spit out not a full web page, but just a whole bunch of LI tags. Now, why is that? Well, let me go back to VS Code and open up in my other terminal uh app.py. And in app.py, scrolling down to search, you'll see that when I get shows from the database, I'm still using search.html, which previously extended my layout and plugged in that whole ordered unordered list. But this time, if I go into this version of search.html, HTML, you'll see that I'm only spitting out raw HTML because I'm assuming that maybe someone, myself included, wants to use slash search to just get a whole bunch of list items that they can put into their own unordered list or UL tag. And so what's effectively happening over here is every time I type a letter, notice at bottom left, another HTTP request goes across the internet, another HTTP request, and each of those is returning the set of LI elements that line up with the query that I've typed in. But this is a little sloppy arguably in so far as I'm returning a chunk of HTML, but out of context, and I'm dictating to the user that they have to use list items. Wouldn't it be nice to just send the raw data? And I can do that, too. Let me go back into VS Code here and look at our final example, shows three, inside of which is a version of this code that now returns that so-called JavaScript object notation. And if I go into shows three, run flask run, go back over now to my browser tab, and click reload, I'll see now when I search for say T and click on that row. Notice now in the response tab of my developer tools, I'm getting back a whole bunch of juicy information. A massive JavaScript object notation chunk of data. Notice the square bracket means here comes a list or an array. Here comes a dictionary or dict. And indeed, that's what I'm seeing. This looks like Python, but it's technically JavaScript and it's technically JavaScript's object notation. This just means this is the juicy data I'm getting back from the server. And if you now think way back to week zero and even our family weekend lecture on AI, a lecture on AI where I was writing code that talked to open AIS so-called API to get responses from our serverside cat. They were sending us JavaScript object notation like this and I was just grabbing the data that I actually cared about, namely the cat's actual response. And so in this case, if I open up in my other terminal window here, app.py, Pi. You'll see in my search route that instead of returning a template, I'm using a crazy named function called JSONify, which is just another function that comes with Flask itself that has the effect of taking the list of Python dictionaries that came back from my SQL database, JSONifying it in such a way that I then can uh serve it to anyone on the internet, myself included, as a service so that I and they can use my own data to implement ment their own web web applications. So that's sort of it for web programming. Ultimately, you now have all of the building blocks from week zero onward to make your own web applications. And if you so choose for final projects, your own mobile applications, even if this too, like everything else has felt like a bit of a fire hose, it is in the process of your final project of specking out and proposing and executing your own final project that will make all of this feel much more comfortable and familiar. And you'll look back on so many of the past weeks as useful building blocks. Uh but this then was your CS50 education weeks 0 through nine. We have just one more left next week. So we'll see you then. [applause] Heat. Heat. [music] [music] [music] Heat. Heat. [music] [music] >> [music] >> All right, this is CS50 week 10, the very end. And we will end today's class just as we ended week zero, which is a little bit of cake outside in the transcept. But over these past 10 plus weeks, if you've been feeling like it was that proverbial fire hose sort of hitting you in the face with so much new content, so many new skills, so many new challenges, um realize that you're in very good company. And we can officially declare nonetheless that if you started the class among those less comfortable, you are officially after today no longer less comfortable. You're at least somewhere in between. And if you were in between, you're more comfortable. And if you were more comfortable, you're perhaps now most comfortable among those here. Um, but keep in mind as per CS50 syllabus, what does ultimately matter in this course is not so much where you end up relative to your classmates, but where you end up relative where uh to where you yourself began. And that's taken into account come final projects, come final grades. But most importantly, that's really what's most important educationally in general is that delta from week zero to in our case here now week 10. Uh, so if it's any reassurance, something I like to bring up around this time is just how badly I did in CS50 and like the very first problem set. Like I didn't even get hello world right somehow in the fall of 1996. So here's a photograph of my homework assignment for assignment one. It was a program to print hello world on the screen. I was incredibly detailed with my comments. Even commenting that main is main which is not the way you're supposed to program. Even telling the the TF where my file ended, which is not really necessary. And I got minus two for not even following directions uh correctly. So take some comfort in that. Even if by problems at nine, you're still getting points off, you're hopefully, at least in my case, in some very good company. It only gets better and easier uh and faster in time. But the whole course ultimately has really been about this picture, right? Problem solving is computer science. And you have inputs, which is the problem to be solved. You have the outputs that you want to get to, which is presumably the solutions there, too. And inside of that proverbial black box are these algorithms, step-by-step instructions for solving some problem. And I pulled up my own notes from CS50's first lecture some 25 plus years ago too where I wrote down this in my horrible writing handwriting to this day. But I noted that what an algorithm is is a precise sequence of steps for getting something done which is pretty much what we now say. Uh I noted that programming itself as we have for weeks now is the process of taking an algorithm and putting it into a language that a computer can process and that's what you've done in Scratch and C and Python and SQL and JavaScript and anything in between. Um, and most important, at least my takeaway that day when it comes to algorithms is precision and correctness. Um, and indeed those are points we've made perhaps not as emphatically um, over the past several weeks as well. But we thought we'd see just how much those two lessons in particular have sunk in uh, by doing a bit of an exercise, some CS50 Pictionary and this our last lecture al together this term. Um, for which to begin we need one brave volunteer to come on up stage. Who would like to volunteer? Who? How about Okay, over here. We never call from the middle of the section. Come on up. Come on up. A round of applause for being so brave. Nice. [applause] All right, come on over. And in just a moment, let's go ahead and do introductions. First, if you want to come up over to the middle of the uh stage and introduce yourself to the world. >> Hi, I'm Gia. I'm a freshman. >> All right. Nice. Nice to meet you. Thank you for joining us. So, what we're about to do is G is going to look at my screen where there's going to be a picture on a white screen. All of you presumably have a white sheet of paper in front of you that you grabbed on the way in. If you don't, just grab one from a friend or your binder or the like. And if you really don't, that's okay, too. But hopefully everyone has a pen or pencil or someone near you does. And what Gia, we're going to ask you to do is program the audience to draw what it is you see on the screen. You can say anything you want, but you may not use any physical gestures or the like. Verbal programming only. >> Okay. >> All right. Come on over to the lectern and in just a moment GN only Gia will see what is actually here on the screen. So, step one for your audience. Okay. So, the first thing that you need to do is draw two lines right next to each other. Two vertical lines. Okay. >> Okay. >> Step two. >> Step two. Once you have done that, you need to draw three dots. One on above those two vertical lines, one right in the middle between those two vertical lines, and one at on the bottom of these three vertical lines, but beneath those two vertical lines. Yeah. So, three dots. >> Okay. Step three. Step three is on the top of the left vertical line, you're going to connect a line from that position to the top dot that you drew. And then on the top of the right vertical line, you're going to connect that position to the top dot that you drew. >> All right, step four >> is remember that top left position? You're going to connect that to the middle dot that you drew. And then the top right of the vertical line at the Yes. You're going to connect that to the middle dot of the line that you drew. >> Got it? >> And then step five, on the bottom left of your left vertical line, you're going to connect that position to the bottom dot that you drew. And then on the bottom right of the right vertical line, you're going to connect that position to the bottom dot that you drew. And now from the middle dot to the bottom dot, you should have no line in between that. And you can now draw a line between those two dots. >> Step six and the last. >> I think you should be done. >> All right. A round of applause then for our programmer. Let me give you a little something >> if you want to take a seat. So now what Kelly and I are going to do is very quickly collect your execution of this program and we'll see just how it went with Gia as the programmer. If you want to just reach out and hand me or Kelly over there any of your handwritings. We don't need all of them. Just a representative sample will suffice. If you're proud of your work, extend your hand quite a bit. Okay. Very proud. Okay. >> Okay. >> Okay. Okay. One more. One more. That's okay. All right. All right, I'm going to run back to the stage. Okay, it's okay if we didn't grab yours. All right. [panting] All right. Thank you to Kelly for grabbing these as well. So, without having seen any of these, here is how you all interpreted Gia's instructions. So, here's one interpretation. Okay. Perhaps similar or different from your own. Uh here's another several vertical vertical line question mark. Okay. Uh here is very narrow one. All right. And and let's see if we got any other variants thereof. Actually, the rest of them are pretty consistent. So, G, if it's any reassurance, I'm seeing a lot of ones that look like this. Here's another that looks like th this. And here's yet another that looks like this. So, if you're wondering where we're going with this, if I go ahead and reveal what it was Gia was looking at on the screen, she was in fact having you draw this here cube. So, some of the takeaways here. So, suffice to say, not all of that went well. Uh, but why was that? Well, I dare say it was very easy to get confused, I think, G, in some of your words because you had in your mind's eye exactly what it was you were drawing. And of course, it was right there on the screen. But we didn't leverage, at least in G's instructions, any abstractions. I dare say it might have been a little bit easier for all of us if maybe she had just teed things up by saying, "All right, everyone, we're going to draw a cube," for instance, which is indeed an abstraction over these lower level details that she was focusing on. But perhaps there could have been another approach altogether, which is even more pedantic. For instance, a lot of the earliest drawing programs and even worlds like Scratch sort of take for granted that you have a coordinate system like X's and Y's and you can go up, down, left, and right. So, an alternative to just saying, "Hey, I'll draw a cube, which could be subject to interpretation because the cube like this is it like this rotated." So, we still would have needed more information than just a cube from Gia. But here, maybe an alternative approach would have been to really get into the weeds and say, "Put your pen at the top of the page and then draw a straight line to the southwest, for instance, and then draw another line of the same distance to the south and then to the southeast or so forth." And it could have been in terms of degrees. It could be directionally in that way, but it might not have been clear to anyone what it was we were drawing until enough of the lines suddenly appear on the screen and then voila, you see that we've been drawing a cube this whole time. So the degree to which we're precise and the layer of the level of abstraction that we operate in is incredibly important. Whether it's for another human to understand us, for an AI to understand us nowadays, or anything in between. All right, why don't we go ahead and flip things around a bit um for this? Why don't we go ahead and get one more volunteer to do something a little different here on stage? One more. Okay, how about here on the aisle? Come on down. Round of applause for this brave volunteer. Come on down. >> [applause] >> All right. So, in this exercise, we're going to flip things around. So, you all will be giving the instructions verbally by just shouting them out. And our volunteer, whose name is >> Presley. >> Preston. >> Presley. >> Presley. Presley, you want to say a quick introduction? >> Yeah. Uh, my name is Presley. I'm a freshman uh living in Stoton House. >> Nice. Well, welcome. Come on over to the the uh the easel here. And we have a black marker for Presley here. And the only thing that we ask is that you not look up or behind you because the answer is going to be right there on the screen. But everyone else is welcome to look up or over to the TV screen. And if you want to go ahead and face the easel here and as you draw, just make sure to kind of open up after each uh stroke of the pen so that everyone can see what you have done. All right. So no looking up as of now because what the audience is about to do is to program you to draw this on the screen. Oh, way to encourage him. Okay. So, step one, feel free to just raise your hand and we'll shout them out. >> Oh, I heard draw a circle over here. >> But not too big. I heard over here a stick figure. >> Good abstraction. You're going to end up drawing a stick figure. But we should probably be a little more helpful than that. So, let's do the hand thing just so we can be more precise and not overwhelm Presley. There was a hand over here. Yeah. And back. >> Draw a line down. >> Draw a line down from the circle. Presley >> from the bottom >> from the bottom of the circle. Okay, someone else. >> Actually, let me let me rewind. Sorry. Say it again. >> Draw two diagonal lines from the line you just drew. >> Well, I don't think the audience likes this. Wait, let's Oh, >> okay. Okay, that's what we were told. Next step, someone else. >> Good one. Okay. Extend the original vertical line to be about the same height as the circle. >> Okay. Yeah, that's good. Good feedback. All right. Someone else. Next step. Next step. Yes. Draw two diagonal lines from the bottom of the line. >> Nice. Draw two diagonal lines from the bottom of that line that look like legs. Good use of detail and abstraction. Okay, nice. Next step. >> Anyone? We're close. Yeah, over here. line >> on the left. So, you're going to draw a speech bubble to the left of the head with the word high, capital H, with a short line. >> No bubble, just high. >> And you wanted to clarify one other detail. And then a line from high to the face. >> A line from high to the face >> with space in between. Okay. No, you're doing great. It's okay, Presley. Okay. Hang [laughter] in there. Okay. Final step or two. Next step. Anyone at all. >> Feel free to shout it out. >> Adjust the arms to make them look like they're running. >> Adjust the arms to make them look like they're running. Good luck. >> Draw a perpendicular line from the left arm. >> Oh, I like that. Draw a perpendicular line from the left arm >> to the bottom >> to the bottom. >> Okay. And lastly, one final step. >> Same side as Yeah, it's permanent. Uh, I think we need a final touch on the other arm. Maybe. Yes. One final step. >> Anyone? >> Draw a perpendicular line per diagonally to the left >> of the arm >> of the right arm. Just a little bit. >> Just a little bit. >> All right. I think I've I think we've withheld our applause long enough. Presley, if you want to take a step back and look at what you They were trying to get you to draw a round of applause. [applause] So, here too. Let me Here you go. Your dorm room if you would like. Okay. And a little Super Mario as well. All right. So, here too. Um, I think you were the problem this time. Round of applause for Presley. And of course, since it's, you know, permanent ink, it's easy to sort of go off the rails early on and make a mistake. But I think that was actually a nice mix of low-level details like the directions of the lines and the lengths thereof and also some abstractions because I do dare say someone shouting out that it is to be a stick figure gave him a much more helpful mental model. So that might be sort of the comments on top of the function, but when we really got into the weeds of implementing that function, it was more akin to stepbystep instructions for solving this here particular problem. So my thanks to Presley for bearing with us with that one as well. So beyond this, where have we been up until now? So uh if we look back at the past several weeks, this is sort of the trajectory on which uh we've been. So we started with scratch from scratch literally in the very first week. The goal of which was to introduce you to some of those procedural fundamentals like what a loop is and a conditional and boolean expressions and variables which have pretty much recurred in different forms and different languages over the week since thereafter we transitioned to a more traditional language C which many of you will never use again and admittedly even I only use it for like a month or two of the year during CS50 itself. The intent was to be this incredibly foundational language that so many other languages today are built on top of. Case in point, the interpreter that you might use for Python itself can be written in C. And that speaks to how we sort of talked about bootstrapping from one language to another, from lowlevel to high level and beyond. Arrays and algorithms, all of that and uh memory and data structures like all of that is sort of omnipresent in computing, in programming and the like. even though you might not need to in modern languages like Python uh worry as much about managing your own memory because good programmers better programmers have figured out how to solve those problems for you in the language itself or in the libraries that you're using. You can take for granted now that you at least know what a hash table is, what a linked list is, what the trade-offs are among those, what the running times are. And that's what computer scientists and software engineers think about and talk about and whiteboard about in the real world when trying to implement algorithms of their own to real world problems or implementing real world products. And then of course over the past few weeks we've sort of used that as a stepping stone to talk about very modern programming paradigms. most recently web programming. And even though we didn't use it explicitly in the class, mobile programming is increasingly based on HTML and CSS and JavaScript, which might be something some of you will tackle for your own final projects. And you can't escape now using or seeing or leveraging somehow artificial intelligence. And among the goals for today is to at least point you in the direction of tools that now having finished problem set 9, you are welcome and encouraged to use for your final project so that you can build all the more um and all the more successfully than even some of your predecessors just a few years ago could have now that your own work and your own knowhow can be amplified by the impact of AI itself. Um this of course now brings us to today the end, but wanted to give you a sense of where you can go here on out. So with your final project, this really is the uh the intent of the final project is to be the very first of hopefully many projects that you decide to spec out for yourself. Like every problem set thus far has been written by me and the team and you've been following our instructions step by step. The final project takes all of those training wheels off. And even though you are welcome and encouraged to borrow code from say problem set 9 if you want to do something web- based or even earlier if you want to do something that's more similar to past pets is to make it ultimately your own. And even if you want, start with a completely empty window and just a blinking prompt and build something of your own. Um, setting out for yourself, as you've seen in the specification, a good goal, which you intend to meet no matter what, a better goal, which is a bit more of a stretch, and a best goal, which in practice rarely ever happens with software. To this day, 25 years since taking CS50 myself, um, or plus now, um, even I consistently underappreciate just how long it takes sometimes to solve problems. But that's beginning to go away at least to some extent thanks to AI where at least now you essentially have a junior colleague next to you who can help solve bugs for you, point you in the right direction, even tackle features as well. Um, all that we ask for this final project is that you build something of interest to you, that you solve an actual problem, that you impact campus, or that you, as we say in the spec, change the world and try to achieve something, try to create something that outlives the course itself over these final few weeks of the class and even continue on with it if you'd like in January and beyond. Uh, for now, this the so-called CS50 charades for which we need two teams of three. So, if you're sitting there in a group of three of friends total, or we'll form one up here live. So, come on up as our first volunteer. Need five more volunteers. Feel free to volunteer. The person's next to you. Three in a row. How about two more over here? One. And how about two on the end? Come on up. All right. And a round of applause for these six here volunteers. And [applause] all right, let me give you one microphone. Let me give you second microphone. And Kelly, if you want to come on up as well. I think these three seem to know each other already. So, we'll have them be one team. If you guys want to be another team as well, come on up. Uh, let me take one microphone actually for the other team. All right. And how about quick introductions to this team here. And first, we need a team name from you all. You haven't had time to think about this. >> Team A. Okay. So, team A is who? >> Uh, I'm Leah. I'm a first year and I'm in wholeworthy. >> Welcome. Uh, >> my name is Stephen. I'm a freshman in candidate F. I'm Charlotte. I'm a freshman and I'm also in Canada F. >> All right, let's do introductions on the other team as well. You are going to be team >> Awesome Sauce. >> Awesome sauce. Okay. Versus team A. Uh, if you want to go ahead and introduce yourselves here. >> Hi, my name is Jenny Pan. I'm a freshman in Hollis. >> Hi, my name is Noah. I'm a freshman in Halbut. >> And hi, my name is Marie and I'm a freshman. Sorry, I'm a freshman in Canada. >> All right, welcome to both of our teams here. And among the goals now, let's leave one microphone with each team, uh, is to play a bit of charades whereby one of you in a moment is going to be responsible for acting out a word that you see on the screen. So, we're going to put on this screen and this screen over here some term that relates to CS50 somehow, and that person's goal over the course of 60 seconds is going to be to act that out in such a way that their teammates can hopefully guess what the word is. We'll give you 60 seconds at a time. Kelly has kindly offered to keep score. Um, and if you solve it in fewer than 60 seconds, we got another word for you and another word. And we'll see how many points you can acrewue over the course of those 60 seconds. And depending on how this goes, we'll do maybe one or two rounds in total. Questions. >> Skips do we get? >> How many skips do you get? I guess you can skip uh as many as you want until we run out of questions. >> Oh. Oh, >> but try not to run through all of our questions. All right. Any questions though beyond that? All right. So, if you guys want to step off stage over there, why don't we have team A begin? So, one of you, Leah, if you're holding the mic, if you want to be the charader, let's go ahead and have you stand here so you can see the screen. And we only ask that you two not look up because the answer is going to be right there. >> All right. And you should just shout out uh the word that Leah is acting out. Question. >> Acting only charades. >> Speaking. >> Yeah. Yeah, I can't speak because that would kind of defeat the point. So, yes, just acting out. Just acting out physically. All right. >> I'm going to go over here. Give me just a moment to get the slides ready with your questions. And Leah, the first clue. Oh, and Kelly's going to be timing you. 60 seconds to acrew as many points as you can. All right, here we go. Go. Act that out. >> Oh, that was weird. Thank you. Sorry. Yes. Act out. This is CS50. All right. No. Act this out. Please go. >> Loop. calling a recursion. >> Yes. One point [applause] >> coming >> uh an array link list >> abstraction >> snake. >> Python. Python. >> Yes. Python >> duck. The duck. >> Nice. >> Binary. Uh >> one zero >> binary digit bit >> bite >> one zero. It's definitely binary asy. >> Want to pass >> link list array. >> Yes. Array >> loop. >> Yes. Loop >> time. time. All right. Very nicely done. [applause] All right. Five is the score to beat. So, if you guys want to step over here, if uh one of you has the mic, go ahead and assume the same roles. Five is the score to beat. All right. Five is the score to beat. All right. Here we go. Final round. First word. And you guys just make sure you don't look up. Go. Head [laughter] node >> algorithm >> input algorithm [laughter] >> these are hard [laughter] No. [laughter] >> Sure. You have to act it out. Act it out. [laughter] >> Oh, they go. Run time. Run time. What's that? >> Tree. >> Yes. Tree. >> Next one. >> Oh my god. >> Next one. >> Binary search. >> Binary boolean. No. A merge s call phone call >> function. >> It was binary search, wasn't it? >> What was binary >> phone? Oh, that's time. All right, but a round of applause for our team awesome sauce. [applause] >> Okay, we have some some parting prizes for you, your very own Super Mario Pezes for you guys as well. I'm glad we squared away that the ability to pass though on the question, so thank you for that. All right, so admittedly pretty hard. Our thanks to all of these volunteers for playing that out. Allow me to turn our attention back to here in just a moment where else uh we can go from here. So up until now up until now we've been using Visual Studio Code for CS50 at the URL CS50. Recall that this is just an adaptation of a commercial tool called GitHub code spaces which is like a cloud-based version of Visual Studio Code itself or VS code which is an largely open source tool for Microsoft that's incredibly popular in the industry which is to say even though we have the CS50 library in there and we turned off by default some of the menu options and we disabled AI. It is the tool that so many programmers around the world do use every day to write code. So you have been learning all this time sort of industry standards in that sense. It is now time if you so choose, but you are welcome to keep using this for your final project if feeling more comfortable with it. Uh to drop the 4CS50 and actually install on your own Mac or PC if you so choose Visual Studio Code itself. You can go to this URL here. Um it's fairly straightforward to install it. But invariably you'll run into probably some technical support headaches depending on the language that you're trying to use with it. For instance, if you're trying to use it with Python, you'll probably also have to download and install Python onto your computer at least if you want the latest version. And just know a priori that sometimes just stuff happens and it just doesn't work and you have to Google or ask chat GPT and that's fine and honestly that's kind of normal but this is also why we don't do any of this in week zero of the class so that we can focus on hello world and Mario and cash and credit and get into the interesting parts of computing and programming and not frust uh not frustrating you so with technical support challenges. But now given that all of you are somewhere in between or among those more comfortable uh you're now ready to sort of uh deal with those same technical challenge yourself. But who knows maybe it will go perfectly smoothly. Um you can go to CS50's own documentation because if you want to be able to use all of the same software that CS50 has pre-installed you can use a technology known as containerization with a tool called Docker and actually run a CS50 environment on your Mac or PC or even in the cloud but still run VS Code on your own Mac and PC. Among the upsides of which are that you're not dependent necessarily on the cloud. You can do everything offline. Uh which is useful in general. You can do things more quickly sometimes if you're using the full capabilities of your own computer and not just a browser. So this is generally how uh programmers approach their code using something like VS Code or alternative products. And in fact there's a bunch of others out there but perhaps the trendiest right now are these three here. Not just Visual uh Studio Code itself um but a tool called Cursor, another one called Windsurf. There's dozens of other text editors, often known as integrated development environments, which tend to have even more features that you can download for free or commercially on your own Macs, PCs, and the like. Uh, but you can't go wrong transitioning from CS50 to VS Code on your own Mac or PC, if only because you're already familiar with it. As for the command line, so those of you with Macs might have found somewhere in your utilities folder a program called Terminal. Um, if not, poke around there later today and you'll see that all this time you've had a command line interface available to you on Mac OS. Windows has something similar as well. They don't necessarily come with all of the same tools that we've been using within CS50.dev, but if you're a Mac user and you go to this URL here, or you're a Windows user and you go to this URL here, or if you're a Linux user, you probably know all of this already, so there's no URL for you there. Um you can install some of those same tools on your Mac and PC and feel all the more at home uh doing things in a command line as well. Um git this is something that we actually in CS50 abstract on top of. This is essentially the de facto standard nowadays for collaborating with other people using a central cloud server in order to share your code with it and in turn other people uh for versioning your code so that you keep track of multiple uh versions thereof and changes that you've made. um go to this URL here if you would like and you'll see a tutorial by CS50's own Brian U introducing you to actual Git because we've been sort of abstracting away this particular tool by just doing it all automatically for you. If you've ever gone through your timeline in CS50.dev being able to roll back to previous versions of your code, we're just using Git, but we're automatically running this command for you. If you want to collaborate with partners for your final project, you can use Git. However, I will encourage you to alternatively use Visual Studio Code's live share feature, which allows one of you to log into your code space, click some buttons, and then share access to your code space with your friend or your partner on whom with whom you're working on the project, and you can both in real time like Google Docs edit the code or different files therein uh using that one code space. A little easier than getting onboarded at least with Git. um hosting a website if this proves of interest for your final project or even after the course if it's a static website. Two popular places to go if only because they offer free tiers is what's called GitHub pages which you can use to just host HTML CSS and JavaScript with no Python, no Flask, no backend. Um or Netlefi is a popular company nowadays too that has an uh entry-level account that for which you can sign up for free. If you just want to have like a portfolio website, if you're an artist or a programmer, you just want to have static content that you write once and deploy, these are good starting points, but not all of them. Hosting a web app. So, this law, this list gets even longer. And all of these recommendations are essentially uh curated by the teaching staff. So, they're all opinionated, but these are perhaps the most common places you can go. Um, Amazon, Microsoft, Google, Cloudflare, they all have student type accounts. So, if you use your.edu email address, for instance, or some other form of proving your status as a current student, you can generally sign up for discounts and free access to a lot of these same services as well without having to pay while you're just learning along the way. GitHub has something similar called the student developer pack. And then a couple of other companies for hosting web apps that have been popular are Heroku, Verscell, and bunches of others. So by web app we mean not just HTML, CSS and JavaScript but maybe some Python maybe some JavaScript on the server maybe Ruby yet another language or any number of others when you actually need a backend in addition to the front end maybe you need a database as well this would be the place to start whether it's at the CS50 hackathon or beyond um and nowadays this is a slide that didn't even need to exist a couple of years ago asking AI again for your final projects you are welcome and encouraged to amplify your own productivity with AI not by having it do for you but moving away from the duck which by design has been fairly limited and meant to be a good teacher but not necessarily one that's going to be a good partner when it comes to building your final project. So chatbt claw gemini uh GitHub copilot openai codeex v 0ero um are all uh popular tools right now that you might want to play around with. The easiest of these to use perhaps if not familiar with say Chacha BT already would be GitHub copilot only because you can enable it within your CS50 code space by following our own documentation at cs50.thed the docs.io where we'll tell you the sequence of steps via which you can reenable AI now that you're allowed to for your final project and turn on all of those features that were disabled by default. Um and then there's still humans out there like it remains to be seen just how popular these websites are in the years to come for better or for worse. Um, but among the places that programmers and technopiles have gone for years are Reddit, Stack Overflow, Server Fault, where there's a rich history of questions and answers that ironically all of those AIs have been trained on, which unfortunately means some of these might be driven out of business eventually in some sense if we're all just turning only to AI. But when you actually want that human component, these are still good places to go. Um, and then news. Two of the many places you can go for news in technology, computing, computer science more broadly, would be TechCrunch is still a good one. hacker news so to speak and then you might have some of your own popular choices as well. Um and then if uh with some bias um take other classes like CS50 besides this undergraduate class has a rich history now over the past decade of creating all the more open courseware. So courses in more Python, more SQL, a language called R, cyber security, uh game development and more. All of those are linked at this URL here edex.org.css50 where you need not pay or sign up beyond auditing the course and all of the content is freely available. something for winter break, for instance, if you want to dive a little more deeply into some subject for the sake of your final project, your professional aspirations, or even just to prepare for spring term. And then over the coming weeks too, will CS50 itself be soliciting interest in applications for becoming a teaching fellow or TF, a course assistant or CA. If you would like to get all the more involved as a teacher of CS50 next fall, uh do uh follow the application link that we will soon circulate uh via email. Um, and do stay in touch too if you just enjoy answering other people's questions or seeing what the pulse of sort of computing is. At this URL here is a whole bunch of CS50's own communities uh in social media largely via which you can follow along at home in the months and years to come too. So, a few thanks before we do one final game al together. Um, to all of the people who have been making this course possible. Um, so our friends at Memorial Hall who make bring us into this beautiful space and make it possible for us to have of all things a class in such a space. um our friends at ESS who help with the audio each and every week in CS50. Um the restaurant Changa down the road, we hope you'll continue to visit our friends there. Wesley Chen is a good friend of ours and the manager um please tell him you're from CS50 and I'm sure he'll be delighted to see you. Um and then CS50's own team, most of whom were in back there or sitting next to you with cameras um without whom the course wouldn't be possible. And of course CS50's own teaching fellows and CAS, just a few of whom posed here for this photo. If I could invite you to all give everyone here a round of applause, my thanks to all of them. >> [applause] >> So, um, and then of course the CS50 duck should be thanked as well. Okay. Thanks. [applause] The CS50's own Rang Shinlu and some of our own former teaching fellows and students who have been behind the development of that their duck that you've gotten to know over these past several months. All right, if Kelly could join me again on screen, the only thing between us and cake is a final game, namely a quiz show in which all of you can partake. Here we go. Question one. What is the largest number an 8bit unsigned binary digit can represent? 256, 128, 255, or one? Starting strong, and keep in mind all of these questions came from you all because we asked you recently for review questions that are now on the screen. Again the timer is clicking and most popular answer was 255 which I think if we click once more we'll confirm was in fact the correct answer. So why is that and why is it not 256? Well if we start counting from zero as we always have that's consuming one of the 256 possibilities. So the largest number that we can represent with that's 8 bit and unsigned which means no negative numbers involved is indeed going to be 255. treasure that information now always. All right, next question from Kelly. Which issue is at the center of the year 2038 problem, which hopefully you added to your Google calendars a few weeks back. Integer overflow, malicious inputs, SQL injection attacks, or memory leak. Which of those is at the core of the year 2038 problem? All right, let's go ahead and reveal the number one answer with 92% of you saying integer overflow is in fact correct because we're still in the habit of using 32-bit integers to keep track of time from the so-called epoch which was January 1st, 1970. And unfortunately, we humans aren't great at sort of planning ahead. And so we're going to run out of permutations of 32bits by a certain date in the year 2038 unless everyone upgrades their computers to 64-bit counters which thankfully most every piece of modern hardware nowadays is using already. Your Macs, your PCs, and your phones. So hopefully this will be really a non-event, but hopefully you'll think of us in CS50 in uh you know 10 plus years when your Google calendar reminder goes off. Question three, which of the following is not a step of compiling? Linking, pre-processing, assembling, or interpreting? Bit more of a challenge. Which of these is not a step of compiling? All right, almost 200 responses coming in. All right, why don't we go ahead and reveal the most popular answer with 54% of you saying interpreting is in fact correct. Recall that we we talked about compiling. Compiling itself is just one of several steps. There is in fact the pre-processing step which takes care of any of the hash symbols in C that start with hash include hashdefine and the like. That's pre-processing. Uh there was then assembling or there was then compiling which actually compiled your code into assembly code. There was then the assembler which would actually take it down further to machine code and then linking 29. This is for 29% of you. The linking step, recall, was taking your zeros and ones and combining them with say CS50's libraries zeros and ones and maybe the standard IO libraries zeros and ones, linking them all together to give you one executable program like hello uh itself. All right, next question. What does a pointer store? The name of a variable, the memory addresses of a value, the size of a value, or the value of a variable? Think for a moment. What does a pointer store? All right, about 200 responses in and yes, the memory address of a variable with 96% of you confirming as much. That is correct. Question five. What is the running time of linear search? Big O of 1, big O of N, big O of N squared, or big O of N log N? linear search running time. And recall that with something like search, you could get lucky. But if big O is the upper bound on our running time, you might not. You might hit the end of the list that you're searching. And so the running time of linear search is of course big O of N. It might be omega of one, but not big O of one. At least if we're considering what the worst case scenarios might be. All right, on to question six. Which what data structure follows the first in first out principle? A Q, a link list, a stack, or a hash table? First in, first out, aka FIFO. Which of these is FIFO? All right. First in, first out is in fact a Q as you would hope if you're getting in line for a restaurant, for a store. You'd hope that if you're the first one in line, you're going to be the first one out equitably speaking. And so it is in fact a queue. The opposite of that in some sense then would have been a stack whereby when you think about the cafeteria trays, the sort of first one in is actually the last one out. So LIFO instead for a stack. All right, question seven. Which operator returns the memory address of a variable? An asterisk, a dollar sign, an amperand, or a hyphen and a greater than sign. presumably in C which returns the memory address of a variable. All right, let's see what everyone thinks. So the most popular and correct answer is the amperand. This is the address of operator. The asterisk recall in most context is the opposite of that. That's the dreference operator. It's actually go to an address. Um this is not a thing in C. Uh this though is similar in spirit to a combination of the star operator and the dot operator which means to dreference and follow a pointer to something inside of a strct typically. All right, question eight. Which SQL command is used to remove duplicate rows from a result set? Remove, unique, distinct, or clean? We didn't spend a huge amount of time on these keywords, but only one of them applies here. A result set is just the answers that you get back when doing your select. And if you want to filter out duplicates, you can in fact say distinct is correct. Unique is also a keyword in SQL, but that is when you want to define in your schema that a columns values are going to be unique, like an email address column instead. Distinct is how you filter out duplicates in your selects. All right, question nine. We're past the halfway mark. What does an HTTP code of 418 signify? Not found. I'm a teapot. Forbidden, unauthorized. 418. This too. If you know this one, moving forward, you'll be considered among the CS elite. answers are coming in a little slower, but I'm a teapot is correct, which is not actually a thing or useful technology. It was in fact an April Fool's joke years ago where a bunch of computer scientists got together in a room and wrote out an entire specification for what it means for a server to return 418. I'm a teapot. All right, number 10. Where does Malo dynamically allocate memory from? The heap, the stack, global variables, or assembly? All right, heap is in fact correct. That's the sort of top part of the memory. Even though top and bottom make no actual technical sense. It's just our artist rendition thereof. The stack recall is what is used when functions are being called. Every time a function is called, it gets a so-called frame on the stack. That's where your local variables and your arguments get put. But if in C you use maloc, it does in fact end up on the heap. in C. If you allocate memory with Maloc but forget to call free, what problem can occur? A memory leak, segmentation fault, stack overflow, or all of the above if you allocate memory with Maloc but forget to call free. What problem can occur? All right, most popular answer is in fact memory leak, which is correct. Um, you could imagine scenarios in which you also get a segmentation fault andor a stack overflow, but those aren't direct consequences of not calling free. That's generally the consequence of using too much memory, for instance, or in this case doing something wrong with your memory. So interrelated, yes, but in terms of not calling free for each maloc, this is what's going to happen by definition. All right, well done there. Next question, which is 12. What does this domain name give the web page of? Safetychool.org. Is it Harvard University? Is it Princeton University? Is it Yale University? Or Colombia University? All right. Recall that this was in the context of our HTTP redirections. Yes. Interesting. Yes. In fact, uh Yale University, some alum has been paying like $10 a year for like 20 years for this joke. safetychool.org if you visit it returns an HTTP 301 uh HTTP header which says the location of it is in fact yale.edu. All right 13 three to go. What is the purpose of DNS? Uh to encrypt data sent over the dark web to find the nearest coffee shop for you to protect your location against hackers or to translate domain names into IP addresses. What is the purpose of DNS? If helpful, domain name system. All right, about at the 200 mark and the correct answer is indeed domain names into IP addresses. That is a server that is on your home network, on your ISP's network, on your campus's network, your corporate network. That just answers questions like that for you. All right, second to last question. Which of the following is not a built-in SQL feature to tackle race conditions? Begin transaction, commit, roll back, or enroll? We talked ever so briefly about this in the context of ending up with too much milk. Recall and the correct answer is indeed in roll. All three of those even though you didn't have to use them for problem set seven or nine um are indeed uh features of SQL. Uh but enroll is not a thing. All right. And the very last question. and try to answer this as quickly as you can. What does Professor Men say at the beginning of every CS50 lecture? Welcome to Harvard's computer science class. Hello everyone. Ready to code? All right, this is CS50 or let's [laughter] get started with some programming. All of these questions were in fact written by you all. All right. And the correct answer, I'm pretty sure with 98% of you saying so, is all right, this is CS50. And all right, this was CS50. Cake is now served. [applause] [music] >> [music]

Download Subtitles

These subtitles were extracted using the Free YouTube Subtitle Downloader by LunaNotes.

Download more subtitles

Most Viewed

Untertitel für 'Nicos Weg' Deutsch lernen A1 Film herunterladen

Laden Sie die Untertitel für den gesamten Film 'Nicos Weg' herunter, um Ihr Deutschlernen auf A1 Niveau zu unterstützen. Untertitel helfen Ihnen, Wortschatz und Aussprache besser zu verstehen und verbessern das Hörverständnis effektiv.

ดาวน์โหลดซับไตเติ้ล DMD LAND 3 The Final Land Day 1

ดาวน์โหลดซับไตเติ้ลสำหรับวิดีโอ DMD LAND 3 The Final Land Day 1 เพื่อช่วยให้เข้าใจเนื้อหาได้ง่ายขึ้น และเพิ่มความสะดวกในการติดตามทุกช่วงเวลา เหมาะสำหรับผู้ชมที่ต้องการความชัดเจนและเข้าถึงข้อมูลอย่างครบถ้วน

Descarga Subtítulos para NARCISISMO | 6 DE COPAS - Episodio 63

Accede fácilmente a los subtítulos del episodio 63 de '6 DE COPAS', centrado en el narcisismo. Descargar estos subtítulos te ayudará a entender mejor el contenido y mejorar la experiencia de visualización.

Subtítulos para TIPOS DE APEGO | 6 DE COPAS Episodio 56

Descarga los subtítulos para el episodio 56 de la tercera temporada de 6 DE COPAS, centrado en los tipos de apego. Mejora tu comprensión y disfruta del contenido en detalle con nuestros subtítulos precisos y accesibles.

Download Subtitles for Your Favorite Videos Easily

Enhance your video watching experience by downloading accurate subtitles and captions. Enjoy better understanding, accessibility, and language support for all your favorite videos.

If you found these subtitles useful, consider buying us a coffee. It would help us a lot!

Download Subtitles for Harvard CS50 2026 Computer Science Course

Harvard CS50 (2026) – Full Computer Science University Course

Related Videos

Download Subtitles for CS50x 2026 Lecture 0 - Scratch Video

Download Java Full Course Subtitles for Free (2025)

Download Subtitles for CLAUDE CODE Full Course 2026

Download Subtitles for Introduction to DaVinci Resolve Full Course

MASTERCLASS 2026 Subtitles Download - June 11 Session Captions

Most Viewed

Untertitel für 'Nicos Weg' Deutsch lernen A1 Film herunterladen

ดาวน์โหลดซับไตเติ้ล DMD LAND 3 The Final Land Day 1

Descarga Subtítulos para NARCISISMO | 6 DE COPAS - Episodio 63

Subtítulos para TIPOS DE APEGO | 6 DE COPAS Episodio 56

Download Subtitles for Your Favorite Videos Easily

Start Taking Better Notes Today with LunaNotes!