Where to begin? Entry points
We began with a blank canvas. Even though an empty file is technically a valid C program (thanks hacker news!), still, we can’t get an executable without the linker failing. This is for good reason, the operating system requires a starting point where to begin execution, such a position is called the Entry point. In every C program, the entry point is a function called main
.
Entry point definition #
I will be conducting the following exercises in OS X 10.8.5 with Xcode command line tools installed. The programs I’ll be using are as follows: vim and cc. All the action will take place on the default Mac terminal program. As a result, I can only vouch for the following results on setups similar to mine.
So here we go! Our first non-empty C program - first.c. This time with the entry point in place.
main
After each block of code, I’ll compile first.c using the Clang compiler, like so. I just won’t be repeating this command. Right, back to business.
$ cc first.c
And, quite unsurprisingly, we get the following error messages.
first.c:1:1: error: unknown type name 'main'
first.c:1:5: error: expected identifier or '('
Let’s start with the first one.
What is a type? #
Our intention was to provide the entry point - main
, what went wrong? Well, we didn’t provide any hints on how to interpret this program to the compiler, so it had no way to assign meaning to the word main
. Since main is the name for the entry point, in C terms, we understand that as an identifier.
An identifier is a collection of alphanumeric characters (has to start with a letter or an underscore). You then can use such an identifier to identify variables, functions and so on. Every identifier has to have a type definition, sort of like adding meta information to a word - “Sam is a girl”. In C, an equivalent would be…
girl Sam
Or if you prefer valid C examples.
int number
We call int
an integer type. Why do we need types? One good reason is memory.
Memory #
Everything you put in a program is translated into machine code and is placed in the memory, while the program is running. Computer memory is a collection of bits. You can image one as follows.
You can’t store much information in two states, on|off, 1|0. So memory is organised into collections of 8 bits, called bytes.
Every byte has an address, and using that address you interact with a computers memory. How would you interact with the memory? Store and read data, of course. Using simple math, you can calculate that the maximum number you can store in one byte is 28-1 = 255. You can use your calculator to find that out.
255 is a very small number, and even though the entire ASCII character set makes do with less than that, most types of data require more than a single byte of storage space.
Back to types… #
So types define how many adjacent bytes are required to store some data, which you identify with an identifier. Imagine that bytes are houses, and every house has a unique street address.
When the CPU tries to lookup a value of an identifier, it first checks for the address of the first house (identifiers point to their first byte). Then it starts knocking on the doors of the first house, then the second and so on. It does that with each adjacent house, until the number of houses touched is equal to the number specified by the type of the identifier.
Great, now on to our second error.
Syntax #
C programs have to have a function called main
as their entry point, we figured that one out already. The part we’re missing is a little bit of syntax to help the compiler understand what we have here is actually a function. For that we use parentheses ()
following our function identifier.
Let’s rewrite our code with all this new knowledge.
int main()
Woops, a new error message. Think of it this - a new error message is better than the one you had last time.
first.c:1:11: error: expected function body after function declarator
This is another case of specific language syntax, C requires that you provide curly brackets {}
after your function identifier if you’re defining what the function does. This is to separate what happens inside of a function from the rest of your program.
int main() {}
Boom! No more error message and we have created a brand new executable file. You can check that it’s actually there with this terminal command.
$ ls
a.out first.c
What’s next? #
You probably deduced that your executable is the file called a.out
. Strange name right? Well it has a pretty straightforward meaning - assembler output. Fair enough. What’s next? You might want to go and run your executable file, but in order to explain how you do that, you have to get nice and comfy with your terminal. Which is a whole different topic.
So I guess that’ll be next.