Document Tree | ||
|
NAVIGATION Programming
Contact |
Caml is a programming language which is very different from other, "classical"[1] languages. While a program written in a classical languages consists mainly of statements, Caml programs are composed of functions. A statement takes its input values from the environment (such as variables), computes something, and stores the result again in the environment. To achieve more complex behaviour, you can execute one statement after another, or put your statements into loops. Functions are differ from this model, as there is no possibility to store something into the environment. For example, to determine the biggest number of an array, execute the following C statements: maximum = a[0]; for (i=1; i<n; i++) { if (a[i] > maximum) maximum = a[i]; }The variable "maximum" contains the biggest number which has been found so far. The characteristics of "maximum" is that it is a store, i.e. the value associated with it can be changed. In a functional language a different style is preferred[2]. Variables are normally considered immutable, and behave much more like variables in mathematics. Repetitive execution of the same code is expressed by recursive functions instead of loops: let rec determine_maximum maximum i = if i < n then if a.(i) > maximum then determine_maximum a.(i) (i+1) else determine_maximum maximum (i+1) else maximum in determine_maximum a.(0) 1Here, "let", "rec", "in", "if", "then", and "else" are keywords. The function "determine_maximum" takes two arguments, "maximum", and "i", and computes the maximum of "maximum" and all array elements starting with "i". This function is a local function, i.e. it is not visible in the whole program, but only in the expression following "in". Some notes to better understand this piece of code:
But what is the advantage of a functional programming style? It seems that you must write much more lines of code to get the same effect. The answer is that Caml functions are more powerful than the functions you find in classical languages such as C, and that the above example does not take advantage of this power. Here an alternative formulation which is much shorter: Array.fold_left (fun r x -> if x>r then x else r) a.(0) aThe whole expression is just an invocation of the function "Array.fold_left" with three arguments. The idea is that this function runs over the array "a" (given as third argument), and invokes for every element a user-defined function which gets the value of the element "x" and an intermediate result "r" and which computes a new intermediate result that is to be used for the next element. The first argument of "Array.fold_left" is the user function (i.e. the expression in parantheses); the second argument is the initial intermediate value which is used for the first element. Here, the intermediate result is the maximum of the array elements that have been seen so far. We can use "a.(0)" as the initial value because this number is equal to or smaller than the maximum of the whole array. The user function is a so-called anonymous function, i.e. a function not associated with an identifier serving as the name of the function. Such functions are written "fun arg1 arg2... -> expr" where "fun" is the keyword indicating that a function definition follows. This function takes "arg1", "arg2" and so on as arguments, and the result of the function is computed by evaluating the expression "expr". In our example, the first argument "r" is the intermediate result, and the second argument "x" is the current array element. The function body calculates the maximum of both values. Of course, it is possible to use a named function, too[4]: let calculate_maximum r x = if x>r then x else r in Array.fold_left calculate_maximum a.(0) aThe only difference of an anonymous function is the missing name. The function "Array.fold_left" takes a function as argument. This is important to mention because Caml makes no difference between ordinary values like numbers, and functions. "Functions are first-class values", i.e. they can be used in the same way as input to or output from computations like any other value of the language. It is time to tell the truth about the "let" construct. I have introduced it as a way to define local functions, but it is not restricted to this usage. In general, "let" binds a name with a value, i.e. every time the name is mentioned after that, the bound value is meant instead. These names are often called variables but this is a misnomer; actually the values bound to the variables cannot be exchanged by different values[5]. For example, these "let" bindings are possible: let pi = 3.14 in sin pi let calculate_maximum = fun r x -> if x>r then x else r in calculate_maximum 5 6As you can see in the second example, "calculate_maximum" is just a name for a function, and the literal value of the function is given by an anonymous function just like the literal value of pi is given by a numeric constant. If you write "let f arg1 arg2 ... = expr1 in expr2" this is only an abbreviation of "let f = fun arg1 arg2 ... -> expr1 in expr2". Although the function "Array.fold_left" is predefined[6] it is possible to define the function yourself, i.e. there is nothing "magic" about it. This is very illustrative as it demonstrates that functions are more powerful than statements; imagine you could define a "for" loop yourself in C ("Array.fold_left" has the same meaning as a "for" loop iterating over all elements of an array). This is a possible definition: let fold_left f r0 a = let n = Array.length a in let rec iterate r i = if i<n then let x = a.(i) in iterate (f r x) (i+1) else r in iterate r0 0The function "fold_left" takes the already known three arguments and refers to them by "f", "r0", and "a". Note that we need not to declare that "f" can only be instantiated by a function, it appears as every other variable. The definition of "fold_left" lacks an "in" clause; this means that the name "fold_left" is globally visible and can be used in all subsequent expressions. The function body of "fold_left" has two inner "let" bindings, one for "n", and one for "iterate". The bindings are nested as indicated by the indentation of the lines[7], i.e. "n" is visible in the function body of "iterate". The function "iterate" has two arguments: "r" is the intermediate result, and "i" is the index of the array element which is processed next. "iterate" invokes itself unless the end of the array is exceeded, and the next intermediate result is computed by calling "f" with the current result "r" and the current array element "x" which is "a.(i)". Observe that the local function "iterate" can access the variables "f", "a", and "n" which are introduced outside "iterate". In the code of "fold_left" there is nothing problematic with this, but in a functional language it is possible to evaluate a function in a different context than it was defined. For example, if we want to compute the maximum of all array elements not bigger than "m", we can simply write let m = ...whatever m is... in Array.fold_left (fun r x -> if (x>r || r>m) && x<=m then x else r) a.(0) aNote that if this expression returns a value bigger than "m" this means that the maximum does not exist. The problem with this construct is that the anonymous function passed to "Array.fold_left" contains a reference to "m". When the anonymous function is being evaluated this happens in the context of "Array.fold_left" which has been defined earlier, and "m" did not exist at this moment. Classical languages that allow functions to be passed to other functions solve this problem simply by forbidding it. For example, Pascal which knows local functions does not allow to pass local functions as arguments. This is not what the programmer wants as the construct makes sense. Caml solves the problem as follows: It takes the value of "m" at the moment the anonymous function is being defined, and this value is "incorporated" into the function, i.e. it substitutes all occcurences of "m", and the name "m" vanishes. For instance, if "m" happens to be 42 at the moment of definition, the function passed to "Array.fold_left" is actually: fun x y -> if (x>r || r>42) && x<=42 then x else rThis implicitly created function is called the closure of the original function, because the variable "m" would be a free (unbound, open) variable if the function was passed in its original form. Note that the translation of closures leads to very efficient machine code, as the substitution is deferred until the substituted variable is actually accessed. Of course, functions are the most interesting concept of Caml, but there are other features worth to be mentioned:
|