Where to begin? Entry points

We began with a blank canvas. Even though an empty file is technically a valid C program (thanks hacker news!), still, we can’t get an executable without the linker failing. This is for good reason, the operating system requires a starting point where to begin execution, such a position is called the Entry point. In every C program, the entry point is a function called main.

Entry point definition #

I will be conducting the following exercises in OS X 10.8.5 with Xcode command line tools installed. The programs I’ll be using are as follows: vim and cc. All the action will take place on the default Mac terminal program. As a result, I can only vouch for the following results on setups similar to mine.

So here we go! Our first non-empty C program - first.c. This time with the entry point in place.

main

After each block of code, I’ll compile first.c using the Clang compiler, like so. I just won’t be repeating this command. Right, back to business.

$ cc first.c

And, quite unsurprisingly, we get the following error messages.

first.c:1:1: error: unknown type name 'main'
first.c:1:5: error: expected identifier or '('

Let’s start with the first one.

What is a type? #

Our intention was to provide the entry point - main, what went wrong? Well, we didn’t provide any hints on how to interpret this program to the compiler, so it had no way to assign meaning to the word main. Since main is the name for the entry point, in C terms, we understand that as an identifier.

An identifier is a collection of alphanumeric characters (has to start with a letter or an underscore). You then can use such an identifier to identify variables, functions and so on. Every identifier has to have a type definition, sort of like adding meta information to a word - “Sam is a girl”. In C, an equivalent would be…

girl Sam

Or if you prefer valid C examples.

int number

We call int an integer type. Why do we need types? One good reason is memory.

Memory #

Everything you put in a program is translated into machine code and is placed in the memory, while the program is running. Computer memory is a collection of bits. You can image one as follows.

A single bit

You can’t store much information in two states, on|off, 1|0. So memory is organised into collections of 8 bits, called bytes.

A single byte

Every byte has an address, and using that address you interact with a computers memory. How would you interact with the memory? Store and read data, of course. Using simple math, you can calculate that the maximum number you can store in one byte is 28-1 = 255. You can use your calculator to find that out.

Maximum byte value

255 is a very small number, and even though the entire ASCII character set makes do with less than that, most types of data require more than a single byte of storage space.

Back to types… #

So types define how many adjacent bytes are required to store some data, which you identify with an identifier. Imagine that bytes are houses, and every house has a unique street address.

Bytes as houses

When the CPU tries to lookup a value of an identifier, it first checks for the address of the first house (identifiers point to their first byte). Then it starts knocking on the doors of the first house, then the second and so on. It does that with each adjacent house, until the number of houses touched is equal to the number specified by the type of the identifier.

Great, now on to our second error.

Syntax #

C programs have to have a function called main as their entry point, we figured that one out already. The part we’re missing is a little bit of syntax to help the compiler understand what we have here is actually a function. For that we use parentheses () following our function identifier.

Let’s rewrite our code with all this new knowledge.

int main()

Woops, a new error message. Think of it this - a new error message is better than the one you had last time.

first.c:1:11: error: expected function body after function declarator

This is another case of specific language syntax, C requires that you provide curly brackets {} after your function identifier if you’re defining what the function does. This is to separate what happens inside of a function from the rest of your program.

int main() {}

Boom! No more error message and we have created a brand new executable file. You can check that it’s actually there with this terminal command.

$ ls
a.out   first.c

What’s next? #

You probably deduced that your executable is the file called a.out. Strange name right? Well it has a pretty straightforward meaning - assembler output. Fair enough. What’s next? You might want to go and run your executable file, but in order to explain how you do that, you have to get nice and comfy with your terminal. Which is a whole different topic.

So I guess that’ll be next.

 
59
Kudos
 
59
Kudos

Now read this

Teaching yourself by teaching others, slowly.

At some point during my CS studies I decided to do a lecture on the C programming language. At that point I had no clue whatsoever about the language, but this was in a way an experiment to test if it’s any easier to learn something by... Continue →