Plasma GitLab Archive
Projects Blog Knowledge


Jump to:  OMake Home • Guide Home • Guide (single-page) • Contents (short) • Contents (long)
Index:  All • Variables • Functions • Objects • Targets • Options

Chapter 11  File, I/O and system operations

11.1  File names

11.1.1  node, file, dir

   $(node sequence) : File Sequence
      sequence : Sequence
   $(file sequence) : File Sequence
      sequence : Sequence
   $(dir sequence) : Dir Sequence
      sequence : Sequence

The node, file, and dir functions define location-independent references to files and directories. In omake, the commands to build a target are executed in the target's directory. Since there may be many directories in an omake project, the build system provides a way to construct a reference to a file in one directory, and use it in another without explicitly modifying the file name. The functions have the following syntax, where the name should refer to a file or directory.

For example, we can construct a reference to a file foo in the current directory.

   FOO = $(file foo)
   .SUBDIRS: bar

If the FOO variable is expanded in the bar subdirectory, it will expand to ../foo.

These commands are often used in the top-level OMakefile to provide location-independent references to top-level directories, so that build commands may refer to these directories as if they were absolute.

   ROOT = $(dir .)
   LIB  = $(dir lib)
   BIN  = $(dir bin)

Once these variables are defined, they can be used in build commands in subdirectories as follows, where $(BIN) will expand to the location of the bin directory relative to the command being executed.

   install: hello
 cp hello $(BIN)

The node function is like the file function except that names of .PHONY targets are more permissive. The file function requires an explicit qualifier, the node function does not.

   osh> .PHONY: foo
   osh> file(foo)
   - : .../foo : File
   osh> file(.PHONY/foo)
   - : <phony <.../foo>> : File
   osh> node(foo)
   - : <phony <.../foo>> : File

11.1.2  node-type

   $(file-type nodes) : String Array
       files : Node Array

The function file-type lists the type of node, as one of the following entries.

PHONY
the node is phony (declared with .PHONY).
OPTIONAL
the node is an :optional: dependency.
EXISTS
the node is an :exists: dependency.
SQUASHED
the node is a :squash: dependency.
SCANNER
the node represents the result of scanning a file for dependencies (declared with .SCANNER).
FILE
the node represents a normal file.
DIR
the node represents a normal directory.

11.1.3  tmpfile

    $(tmpfile prefix) : File
    $(tmpfile prefix, suffix) : File
        prefix : String
        suffix : String

The tmpfile function returns the name of a fresh temporary file in the temporary directory.

11.1.4  in

   $(in dir, exp) : String Array
      dir : Dir
      exp : expression

The in function is closely related to the dir and file functions. It takes a directory and an expression, and evaluates the expression in that effective directory. For example, one common way to install a file is to define a symbol link, where the value of the link is relative to the directory where the link is created.

The following commands create links in the $(LIB) directory.

    FOO = $(file foo)
    install:
       ln -s $(in $(LIB), $(FOO)) $(LIB)/foo

Note that the in function only affects the expansion of Node (File and Dir) values.

11.1.5  basename

   $(basename files) : String Sequence
      files : String Sequence

The basename function returns the base names for a list of files. The basename is the filename with any leading directory components removed.

For example, the expression $(basename dir1/dir2/a.out /etc/modules.conf /foo.ml) evaluates to a.out modules.conf foo.ml.

11.1.6  dirname

   $(dirname files) : String Sequence
      files : String Sequence

The dirname function returns the directory name for a list of files. The directory name is the filename with the basename removed. If a name does not have a directory part, the directory is “.”

For example, the expression $(dirname dir1\dir2\a.out /etc/modules.conf /foo.ml bar.ml) evaluates to dir1/dir2 /etc / ..

Note: this function is different from the dirof function. The function dirname is simple a function over strings, while dirof is a function on filenames.

11.1.7  rootname

   $(rootname files) : String Sequence
      files : String Sequence

The rootname function returns the root name for a list of files. The rootname is the filename with the final suffix removed.

For example, the expression $(rootname dir1/dir2/a.out /etc/a.b.c /foo.ml) evaluates to dir1/dir2/a /etc/a.b /foo.

11.1.8  dirof, tail

   $(dirof files) : Dir Sequence
      files : File Sequence
   $(tailof files) : String Sequence
      files : File Sequence

The dirof function returns the directory for each of the listed files. The tailof function returns the tail part (the lost part of the path).

For example, the expression $(dirof dir/dir2/a.out /etc/modules.conf /foo.ml) evaluates to the directories dir1/dir2 /etc /, and $(dirof dir/dir2/a.out /etc/modules.conf /foo.ml) evaluates to a.out modules.conf foo.ml.

11.1.9  fullname

   $(fullname files) : String Sequence
      files : File Sequence

The fullname function returns the pathname relative to the project root for each of the files or directories.

11.1.10  absname

   $(absname files) : String Sequence
      files : File Sequence

The absname function returns the absolute pathname for each of the files or directories.

11.1.11  homename

   $(homename files) : String Sequence
      files : File Sequence

The homename function returns the name of a file in tilde form, if possible. The unexpanded forms are computed lazily: the homename function will usually evaluate to an absolute pathname until the first tilde-expansion for the same directory.

11.1.12  suffix

   $(suffix files) : String Sequence
      files : StringSequence

The suffix function returns the suffixes for a list of files. If a file has no suffix, the function returns the empty string.

For example, the expression $(suffix dir1/dir2/a.out /etc/a /foo.ml) evaluates to .out .ml.

11.2  Path search

11.2.1  which

   $(which files) : File Sequence
      files : String Sequence

The which function searches for executables in the current command search path, and returns file values for each of the commands. It is an error if a command is not found.

11.2.2  where

The where function is similar to which, except it returns the list of all the locations of the given executable (in the order in which the corresponding directories appear in $PATH). In case a command is handled internally by the Shell object, the first string in the output will describe the command as a built-in function.

    % where echo
    echo is a Shell object method (a built-in function)
    /bin/echo

11.2.3  rehash

    rehash()

The rehash function resets all search paths.

11.2.4  exists-in-path

   $(exists-in-path files) : String
      files : String Sequence

The exists-in-path function tests whether all executables are present in the current search path.

11.2.5  digest, digest-optional

     $(digest files) : String Array
        file : File Array
     raises RuntimeException

     $(digest-optional files) : String Array
        file : File Array

The digest and digest-optional functions compute MD5 digests of files. The digest function raises an exception if a file does no exist. The digest-optional returns false if a file does no exist. MD5 digests are cached.

11.2.6  find-in-path, find-in-path-optional

    $(find-in-path path, files) : File Array
       path : Dir Array
       files : String Array
    raises RuntimeException

    $(find-in-path-optional path, files) : File Array

The find-in-path function searches for the files in a search path. Only the tail of the filename is significant. The find-in-path function raises an exception if the file can't be found. The find-in-path-optional function silently removes files that can't be found.

11.2.7  digest-in-path, digest-in-path-optional

    $(digest-in-path path, files) : String/File Array
       path : Dir Array
       files : String Array
    raises RuntimeException

    $(digest-in-path-optional path, files) : String/File Array

The digest-in-path function searches for the files in a search path and returns the file and digest for each file. Only the tail of the filename is significant. The digest-in-path function raises an exception if the file can't be found. The digest-in-path-optional function silently removes elements that can't be found.

11.3  File stats

11.3.1  file-exists, target-exists, target-is-proper

   $(file-exists files) : String
   $(target-exists files) : String
   $(target-is-proper files) : String
       files : File Sequence

The file-exists function checks whether the files listed exist. The target-exists function is similar to the file-exists function. However, it returns true if the file exists or if it can be built by the current project. The target-is-proper returns true only if the file can be generated in the current project.

11.3.2  stat-reset

   $(stat-reset files) : String
       files : File Sequence

OMake uses a stat-cache. The stat-reset function reset the stat information for the given files, forcing the stat information to be recomputed the next time it is requested.

11.3.3  filter-exists, filter-targets, filter-proper-targets

   $(filter-exists files) : File Sequence
   $(filter-targets files) : File Sequence
   $(filter-proper-targets) : File Sequence
      files : File Sequence

The filter-exists, filter-targets, and filter-proper-targets functions remove files from a list of files.

  • filter-exists: the result is the list of files that exist.
  • filter-targets: the result is the list of files either exist, or can be built by the current project.
  • filter-proper-targets: the result is the list of files that can be built in the current project.
Creating a “distclean” target

One way to create a simple “distclean” rule that removes generated files from the project is by removing all files that can be built in the current project.

CAUTION: you should be careful before you do this. The rule removes any file that can potentially be reconstructed. There is no check to make sure that the commands to rebuild the file would actually succeed. Also, note that no file outside the current project will be deleted.

    .PHONY: distclean

    distclean:
        rm $(filter-proper-targets $(ls R, .))

If you use CVS, you may wish to utilize the cvs_realclean program that is distributed with OMake in order to create a “distclean” rule that would delete all the files thare are not known to CVS. For example, if you already have a more traditional “clean” target defined in your project, and if you want the “distclean” rule to be interactive by default, you can write the following:

    if $(not $(defined FORCE_REALCLEAN))
        FORCE_REALCLEAN = false
        export

    distclean: clean
        cvs_realclean $(if $(FORCE_REALCLEAN), -f) -i .omakedb -i .omakedb.lock

You can add more files that you want to always keep (such as configuration files) with the -i option.

Similarly, if you use Subversion, you utilize the build/svn_realclean.om script that comes with OMake:

    if $(not $(defined FORCE_REALCLEAN))
        FORCE_REALCLEAN = false
        export

    open build/svn_realclean

    distclean: clean
        svn_realclean $(if $(FORCE_REALCLEAN), -f) -i .omakedb -i .omakedb.lock

See also the dependencies-proper function for an alternate method for removing intermediate files.

11.3.4  find-targets-in-path, find-targets-in-path-optional

    $(find-targets-in-path path files) : File Array
    $(find-targets-in-path-optional path, files) : File Array
        path : Dir Array
        files : File Sequence

The find-target-in-path function searches for targets in the search path. For each file file in the file list, the path is searched sequentially for a directory dir such that the target dir/file exists. If so, the file dir/file is returned.

For example, suppose you are building a C project, and project contains a subdirectory src/ containing only the files fee.c and foo.c. The following expression evaluates to the files src/fee.o src/foo.o even if the files have not already been built.

    $(find-targets-in-path lib src, fee.o foo.o)

    # Evaluates to
    src/fee.o src/foo.o

The find-targets-in-path function raises an exception if the file can't be found. The find-targets-in-path-optional function silently removes targets that can't be found.

    $(find-targets-in-path-optional lib src, fee.o foo.o fum.o)

    # Evaluates to
    src/fee.o src/foo.o

11.3.5  find-ocaml-targets-in-path-optional

The find-ocaml-targets-in-path-optional function is very similar to the find-targets-in-path-optional one, except an OCaml-style search is used, where for every element of the search path and for every name being searched for, first the uncapitalized version is tried and if it is not buildable, then the capitalized version is tried next.

11.3.6  file-sort

   $(file-sort order, files) : File Sequence
      order : String
      files : File Sequence

The file-sort function sorts a list of filenames by build order augmented by a set of sort rules. Sort rules are declared using the .ORDER target. The .BUILDORDER defines the default order.

$(file-sort <order>, <files>)

For example, suppose we have the following set of rules.

   a: b c
   b: d
   c: d

   .DEFAULT: a b c d
      echo $(file-sort .BUILDORDER, a b c d)

In the case, the sorter produces the result d b c a. That is, a target is sorted after its dependencies. The sorter is frequently used to sort files that are to be linked by their dependencies (for languages where this matters).

There are three important restrictions to the sorter:

  • The sorter can be used only within a rule body. The reason for this is that all dependencies must be known before the sort is performed.
  • The sorter can only sort files that are buildable in the current project.
  • The sorter will fail if the dependencies are cyclic.

11.3.6.1  sort rule

It is possible to further constrain the sorter through the use of sort rules. A sort rule is declared in two steps. The target must be listed as an .ORDER target; and then a set of sort rules must be given. A sort rule defines a pattern constraint.

   .ORDER: .MYORDER

   .MYORDER: %.foo: %.bar
   .MYORDER: %.bar: %.baz

   .DEFAULT: a.foo b.bar c.baz d.baz
      echo $(sort .MYORDER, a.foo b.bar c.baz d.baz)

In this example, the .MYORDER sort rule specifies that any file with a suffix .foo should be placed after any file with suffix .bar, and any file with suffix .bar should be placed after a file with suffix .baz.

In this example, the result of the sort is d.baz c.baz b.bar a.foo.

11.3.7  file-check-sort

   file-check-sort(files)
      files : File Sequence
   raises RuntimeException

The file-check-sort function checks whether a list of files is in sort order. If so, the list is returned unchanged. If not, the function raises an exception.

$(file-check-sort <order>, <files>)

11.4  Globbing and file listings

OMake commands are “glob-expanded” before being executed. That is, names may contain patterns that are expanded to sequences of file and directory names. The syntax follows the standard bash(1), csh(1), syntax, with the following rules.

  • A pathname is a sequence of directory and file names separated by one of the / or \ characters. For example, the following pathnames refer to the same file: /home/jyh/OMakefile and /home\jyh/OMakefile.
  • Glob-expansion is performed on the components of a path. If a path contains occurrences of special characters (listed below), the path is viewed as a pattern to be matched against the actual files in the system. The expansion produces a sequence of all file/directory names that match.

    For the following examples, suppose that a directory /dir contains files named a, -a, a.b, and b.c.

    *
    Matches any sequence of zero-or-more characters. For example, the pattern /dir/a* expands to /dir/a /dir/aa /dir/a.b.
    ?
    Matches exactly one character. The pattern /dir/?a expands the filename /dir/-a.
    [...]
    Square brackets denote character sets and ranges in the ASCII character set. The pattern may contain individual characters c or character ranges c1-c2. The pattern matches any of the individual characters specified, or any characters in the range. A leading “hat” inverts the send of the pattern. To specify a pattern that contains the literal characters -, the - should occur as the first character in the range.
    PatternExpansion
    /dir/[a-b]*/dir/a /dir/a.b /dir/b.c
    /dir/[-a-b]*/dir/a /dir/-a /dir/a.b /dir/b.c
    /dir/[-a]*/dir/a /dir/-a /dir/a.b
    {s1,...,sN}
    Braces indicate brace-expansion. The braces delimit a sequence of strings separated by commas. Given N strings, the result produces N copies of the pattern, one for each of the strings si.
    PatternExpansion
    a{b,c,d}ab ac ad
    a{b{c,d},e}abc abd ae
    a{?{[A-Z],d},*}a?[A-Z] a?d a*
     
    The tilde is used to specify home directories. Depending on your system, these might be possible expansions.
    PatternExpansion
    ~jyh/home/jyh
    ~bob/*.cc:\Documents and Settings\users\bob

    The \ character is both a pathname separator and an escape character. If followed by a special glob character, the \ changes the sense of the following character to non-special status. Otherwise, \ is viewed as a pathname separator.
    PatternExpansion
    ~jyh/\*~jyh/* (* is literal)
    /dir/\[a-z?/dir/[a-z? ([ is literal, ? is a pattern).
    c:\Program Files\[A-z]c:\Program Files[A-z]*

    Note that the final case might be considered to be ambiguous (where \ should be viewed as a pathname separator, not as an escape for the subsequent [ character. If you want to avoid this ambiguity on Win32, you should use the forward slash / even for Win32 pathnames (the / is translated to \ in the output).

    PatternExpansion
    c:/Program Files/[A-z]*c:\Program Files\WindowsUpdate ...

11.4.1  glob

   $(glob strings) : Node Array
      strings : String Sequence
   $(glob options, strings) : Node Array
      options : String
      strings : String Sequence

The glob function performs glob-expansion.

The . and .. entries are always ignored.

The options are:

b
Do not perform csh(1)-style brace expansion.
e
The \ character does not escape special characters.
n
If an expansion fails, return the expansion literally instead of aborting.
i
If an expansion fails, it expands to nothing.
.
Allow wildcard patterns to match files beginning with a .
A
Return all files, including files that begin with a .
F
Match only normal files (any file that is not a directory).
D
Match only directory files.
C
Ignore files according to cvs(1) rules.
P
Include only proper subdirectories.

In addition, the following variables may be defined that affect the behavior of glob.

GLOB_OPTIONS
A string containing default options.
GLOB_IGNORE
A list of shell patterns for filenames that glob should ignore.
GLOB_ALLOW
A list of shell patterns. If a file does not match a pattern in GLOB_ALLOW, it is ignored.

The returned files are sorted by name.

11.4.2  ls

   $(ls files) : Node Array
      files : String Sequence
   $(ls options, files) : Node Array
      files : String Sequence

The ls function returns the filenames in a directory.

The . and .. entries are always ignored. The patterns are shell-style patterns, and are glob-expanded.

The options include all of the options to the glob function, plus the following.

R
Perform a recursive listing.

The GLOB_ALLOW and GLOB_IGNORE variables can be defined to control the globbing behavior. The returned files are sorted by name.

11.4.3  subdirs

   $(subdirs dirs) : Dir Array
      dirs : String Sequence
   $(subdirs options, dirs) : Dir Array
      options : String
      dirs : String Sequence

The subdirs function returns all the subdirectories of a list of directories, recursively.

The possible options are the following:

A
Return directories that begin with a .
C
Ignore files according to .cvsignore rules.
P
Include only proper subdirectories.

11.5  Filesystem operations

11.5.1  mkdir

   mkdir(mode, node...)
      mode : Int
      node : Node
   raises RuntimeException

   mkdir(node...)
      node : Node
   raises RuntimeException

The mkdir function creates a directory, or a set of directories. The following options are supported.

-m mode
Specify the permissions of the created directory.
-p
Create parent directories if they do not exist.
Interpret the remaining names literally.

11.5.2  Stat

The Stat object represents an information about a filesystem node, as returned by the stat and lstat functions. It contains the following fields.

dev
: the device number.
ino
: the inode number.
kind
: the kind of the file, one of the following: REG (regular file), DIR (directory), CHR (character device), BLK (block device), LNK (symbolic link), FIFO (named pipe), SOCK (socket).
perm
: access rights, represented as an integer.
nlink
: number of links.
uid
: user id of the owner.
gid
: group id of the file's group.
rdev
: device minor number.
size
: size in bytes.
atime
: last access time, as a floating point number.
mtime
: last modification time, as a floating point number.
ctime
: last status change time, as a floating point number.

Not all of the fields will have meaning on all operating systems.

11.5.3  stat, lstat

    $(stat node...) : Stat
       node : Node or Channel
    $(lstat node...) : Stat
       node : Node or Channel
    raises RuntimeException

The stat functions return file information. If the file is a symbolic link, the stat function refers to the destination of the link; the lstat function refers to the link itself.

11.5.4  unlink

   $(unlink file...)
      file : File
   #(rm file...)
      file : File
   $(rmdir dir...)
      dir : Dir
   raises RuntimeException

The unlink and rm functions remove a file. The rmdir function removes a directory.

The following options are supported for rm and rmdir.

-f
ignore nonexistent files, never prompt.
-i
prompt before removal.
-r
remove the contents of directories recursively.
-v
explain what is going on.
the rest of the values are interpreted literally.

11.5.5  rename

    rename(old, new)
       old : Node
       new : Node
    mv(nodes... dir)
       nodes : Node Sequence
       dir   : Dir
    cp(nodes... dir)
       nodes : Node Sequence
       dir   : Dir
    raises RuntimeException

The rename function changes the name of a file or directory named old to new.

The mv function is similar, but if new is a directory, and it exists, then the files specified by the sequence are moved into the directory. If not, the behavior of mv is identical to rename. The cp function is similar, but the original file is not removed.

The mv and cp functions take the following options.

-f
Do not prompt before overwriting.
-i
Prompt before overwriting.
-v
Explain what it happening.
-r
Copy the contents of directories recursively.
Interpret the remaining arguments literally.

11.5.6  link

   link(src, dst)
      src : Node
      dst : Node
   raises RuntimeException

The link function creates a hard link named dst to the file or directory src.

Hard links may work under Win32 when NTFS is used.

Normally, only the superuser can create hard links to directories.

11.5.7  symlink

   symlink(src, dst)
      src : Node
      dst : Node
   raises RuntimeException

The symlink function creates a symbolic link dst that points to the src file.

The link name is computed relative to the target directory. For example, the expression $(symlink a/b, c/d) creates a link named c/d -> ../a/b.

Symbolic links are not supported in Win32. Consider using the ln-or-cp Shell alias for cross-platform portable linking/copying.

11.5.8  readlink

   $(readlink node...) : Node
      node : Node

The readlink function reads the value of a symbolic link.

11.5.9  chmod

   chmod(mode, dst...)
      mode : Int
      dst : Node or Channel
   chmod(mode dst...)
      mode : String
      dst : Node Sequence
   raises RuntimeException

The chmod function changes the permissions of the targets.

Options:

-v
Explain what is happening.
-r
Change files and directories recursively.
-f
Continue on errors.
Interpret the remaining argument literally.

11.5.10  chown

   chown(uid, gid, node...)
      uid : Int
      gid : Int
      node : Node or Channel
   chown(uid, node...)
      uid : Int
      node : Node or Channel
   raises RuntimeException

The chown function changes the user and group id of the file. If the gid is not specified, it is not changed. If either id is -1, that id is not changed.

11.5.11  truncate

   truncate(length, node...)
       length : Int
       node : Node or Channel
   raises RuntimeException

The truncate function truncates a file to the given length.

11.5.12  umask

    $(umask mode) : Int
       mode : Int
    raises RuntimeException

Sets the file mode creation mask. The previous mask is returned. This value is not scoped, changes have global effect.

11.6  vmount

11.6.1  vmount

    vmount(src, dst)
       src, dst : Dir
    vmount(flags, src, dst)
       flags : String
       src, dst : Dir

“Mount” the src directory on the dst directory. This is a virtual mount, changing the behavior of the $(file ...) function. When the $(file str) function is used, the resulting file is taken relative to the src directory if the file exists. Otherwise, the file is relative to the current directory.

The main purpose of the vmount function is to support multiple builds with separate configurations or architectures.

The options are as follows.

l
Create symbolic links to files in the src directory.
c
Copy files from the src directory.

Mount operations are scoped.

11.6.2  add-project-directories

    add-project-directories(dirs)
       dirs : Dir Array

Add the directories to the set of directories that omake considers to be part of the project. This is mainly used to avoid omake complaining that the current directory is not part of the project.

11.6.3  remove-project-directories

    remove-project-directories(dirs)
       dirs : Dir Array

Removed the directories from the set of directories that omake considers to be part of the project. This is mainly used to cancel a .SUBDIRS from including a directory if it is determined that the directory does not need to be compiled.

11.7  File predicates

11.7.1  test

   test(exp) : Bool
      exp : String Sequence

The expression grammar is as follows:

  • ! expression : expression is not true
  • expression1 -a expression2 : both expressions are true
  • expression1 -o expression2 : at least one expression is true
  • ( expression ) : expression is true

The base expressions are:

  • -n string : The string has nonzero length
  • -z string : The string has zero length
  • string = string : The strings are equal
  • string != string : The strings are not equal
  • int1 -eq int2 : The integers are equal
  • int1 -ne int2 : The integers are not equal
  • int1 -gt int2 : int1 is larger than int2
  • int1 -ge int2 : int2 is not larger than int1
  • int1 -lt int2 : int1 is smaller than int2
  • int1 -le int2 : int1 is not larger than int2
  • file1 -ef file2 : On Unix, file1 and file2 have the same device and inode number. On Win32, file1 and file2 have the same name.
  • file1 -nt file2 : file1 is newer than file2
  • file1 -ot file2 : file1 is older than file2
  • -b file : The file is a block special file
  • -c file : The file is a character special file
  • -d file : The file is a directory
  • -e file : The file exists
  • -f file : The file is a normal file
  • -g file : The set-group-id bit is set on the file
  • -G file : The file's group is the current effective group
  • -h file : The file is a symbolic link (also -L)
  • -k file : The file's sticky bit is set
  • -L file : The file is a symbolic link (also -h)
  • -O file : The file's owner is the current effective user
  • -p file : The file is a named pipe
  • -r file : The file is readable
  • -s file : The file is empty
  • -S file : The file is a socket
  • -u file : The set-user-id bit is set on the file
  • -w file : The file is writable
  • -x file : The file is executable

A string is any sequence of characters; leading - characters are allowed.

An int is a string that can be interpreted as an integer. Unlike traditional versions of the test program, the leading characters may specify an arity. The prefix 0b means the numbers is in binary; the prefix 0o means the number is in octal; the prefix 0x means the number is in hexadecimal. An int can also be specified as -l string, which evaluates to the length of the string.

A file is a string that represents the name of a file.

The syntax mirrors that of the test(1) program. If you are on a Unix system, the man page explains more. Here are some examples.

    # Create an empty file
    osh> touch foo
    # Is the file empty?
    osh> test(-e foo)
    - : true
    osh> test(! -e foo)
    - : false
    # Create another file
    osh> touch boo
    # Is the newer file newer?
    osh> test(boo -nt foo)
    - : true
    # A more complex query
    # boo is newer than foo, and foo is empty
    osh> test(\( boo -nt foo \) -a -e foo)
    - : true

11.7.2  find

   find(exp) : Node Array
      exp : String Sequence

The find function searches a directory recursively, returning the files for which the expression evaluates to true.

The expression argument uses the same syntax as the test function, with the following exceptions.

  1. The expression may begin with a directory. If not specified, the current directory is searched.
  2. The {} string expands to the current file being examined.

The syntax of the expression is the same as test, with the following additions.

  • -name string : The current file matches the glob expression (see Section 11.4).

The find function performs a recursive scan of all subdirectories. The following call is being run from the root of the omake source directory.

    osh> find(. -name fo* )
    - : <array
            /home/jyh/.../omake/mk/.svn/format
            /home/jyh/.../omake/RPM/.svn/format
            ...
            /home/jyh/.../omake/osx_resources/installer_files/.svn/format>

Another example, listing only those files that are normal files or symbolic links.

    osh> find(. -name fo* -a \( -f {} -o -L {} \))
    - : <array
            /home/jyh/.../omake/mk/.svn/format
            /home/jyh/.../omake/RPM/.svn/format
            ...
            /home/jyh/.../omake/osx_resources/installer_files/.svn/format>

11.8  IO functions

11.8.1  Standard channels

The following variables define the standard channels.

stdin

stdin : InChannel

The standard input channel, open for reading.

stdout

stdout : OutChannel

The standard output channel, open for writing.

stderr

stderr : OutChannel

The standard error channel, open for writing.

11.8.2  open-in-string

The open-in-string treats a string as if it were a file and returns a channel for reading.

   $(open-in-string s) : Channel
       s : String

11.8.3  open-out-string, out-contents

The open-out-string creates a channel that writes to a string instead of a file. The string may be retrieved with the out-contents function.

   $(open-out-string) : Channel
   $(out-contents chan) : String
       chan : OutChannel

11.8.4  fopen

The fopen function opens a file for reading or writing.

   $(fopen file, mode) : Channel
      file : File
      mode : String

The file is the name of the file to be opened. The mode is a combination of the following characters.

r
Open the file for reading; it is an error if the file does not exist.
w
Open the file for writing; the file is created if it does not exist.
a
Open the file in append mode; the file is created if it does not exist.
+
Open the file for both reading and writing.
t
Open the file in text mode (default).
b
Open the file in binary mode.
n
Open the file in nonblocking mode.
x
Fail if the file already exists.

Binary mode is not significant on Unix systems, where text and binary modes are equivalent.

11.8.5  close

    $(close channel...)
       channel : Channel

The close function closes a file that was previously opened with fopen.

11.8.6  read

   $(read channel, amount) : String
      channel : InChannel
      amount  : Int
   raises RuntimeException

The read function reads up to amount bytes from an input channel, and returns the data that was read. If an end-of-file condition is reached, the function raises a RuntimeException exception.

11.8.7  write

   $(write channel, buffer, offset, amount) : String
      channel : OutChannel
      buffer  : String
      offset  : Int
      amount  : Int
   $(write channel, buffer) : String
      channel : OutChannel
      buffer  : String
   raises RuntimeException

In the 4-argument form, the write function writes bytes to the output channel channel from the buffer, starting at position offset. Up to amount bytes are written. The function returns the number of bytes that were written.

The 3-argument form is similar, but the offset is 0.

In the 2-argument form, the offset is 0, and the amount if the length of the buffer.

If an end-of-file condition is reached, the function raises a RuntimeException exception.

11.8.8  lseek

    $(lseek channel, offset, whence) : Int
       channel : Channel
       offset  : Int
       whence  : String
    raises RuntimeException

The lseek function repositions the offset of the channel channel according to the whence directive, as follows:

SEEK_SET
The offset is set to offset.
SEEK_CUR
The offset is set to its current position plus offset bytes.
SEEK_END
The offset is set to the size of the file plus offset bytes.

The lseek function returns the new position in the file.

11.8.9  rewind

   rewind(channel...)
      channel : Channel

The rewind function set the current file position to the beginning of the file.

11.8.10  tell

    $(tell channel...) : Int...
       channel : Channel
    raises RuntimeException

The tell function returns the current position of the channel.

11.8.11  flush

   $(flush channel...)
      channel : OutChannel

The flush function can be used only on files that are open for writing. It flushes all pending data to the file.

11.8.12  dup

    $(dup channel) : Channel
       channel : Channel
    raises RuntimeException

The dup function returns a new channel referencing the same file as the argument.

11.8.13  dup2

   dup2(channel1, channel2)
      channel1 : Channel
      channel2 : Channel
   raises RuntimeException

The dup2 function causes channel2 to refer to the same file as channel1.

11.8.14  set-nonblock

   set-nonblock-mode(mode, channel...)
      channel : Channel
      mode : String

The set-nonblock-mode function sets the nonblocking flag on the given channel. When IO is performed on the channel, and the operation cannot be completed immediately, the operations raises a RuntimeException.

11.8.15  set-close-on-exec-mode

   set-close-on-exec-mode(mode, channel...)
      channel : Channel
      mode : String
   raises RuntimeException

The set-close-on-exec-mode function sets the close-on-exec flags for the given channels. If the close-on-exec flag is set, the channel is not inherited by child processes. Otherwise it is.

11.8.16  pipe

   $(pipe) : Pipe
   raises RuntimeException

The pipe function creates a Pipe object, which has two fields. The read field is a channel that is opened for reading, and the write field is a channel that is opened for writing.

11.8.17  mkfifo

   mkfifo(mode, node...)
      mode : Int
      node : Node

The mkfifo function creates a named pipe.

11.8.18  select

   $(select rfd..., wfd..., wfd..., timeout) : Select
      rfd : InChannel
      wfd : OutChannel
      efd : Channel
      timeout : float
   raises RuntimeException

The select function polls for possible IO on a set of channels. The rfd are a sequence of channels for reading, wfd are a sequence of channels for writing, and efd are a sequence of channels to poll for error conditions. The timeout specifies the maximum amount of time to wait for events.

On successful return, select returns a Select object, which has the following fields:

read
An array of channels available for reading.
write
An array of channels available for writing.
error
An array of channels on which an error has occurred.

11.8.19  lockf

    lockf(channel, command, len)
       channel : Channel
       command : String
       len : Int
    raises RuntimeException

The lockf function places a lock on a region of the channel. The region starts at the current position and extends for len bytes.

The possible values for command are the following.

F_ULOCK
Unlock a region.
F_LOCK
Lock a region for writing; block if already locked.
F_TLOCK
Lock a region for writing; fail if already locked.
F_TEST
Test a region for other locks.
F_RLOCK
Lock a region for reading; block if already locked.
F_TRLOCK
Lock a region for reading; fail is already locked.

11.8.20  InetAddr

The InetAddr object describes an Internet address. It contains the following fields.

addr
String: the Internet address.
port
Int: the port number.

11.8.21  Host

A Host object contains the following fields.

name
String: the name of the host.
aliases
String Array: other names by which the host is known.
addrtype
String: the preferred socket domain.
addrs
InetAddr Array: an array of Internet addresses belonging to the host.

11.8.22  gethostbyname

   $(gethostbyname host...) : Host...
      host : String
   raises RuntimeException

The gethostbyname function returns a Host object for the specified host. The host may specify a domain name or an Internet address.

11.8.23  Protocol

The Protocol object represents a protocol entry. It has the following fields.

name
String: the canonical name of the protocol.
aliases
String Array: aliases for the protocol.
proto
Int: the protocol number.

11.8.24  getprotobyname

   $(getprotobyname name...) : Protocol...
      name : Int or String
   raises RuntimeException

The getprotobyname function returns a Protocol object for the specified protocol. The name may be a protocol name, or a protocol number.

11.8.25  Service

The Service object represents a network service. It has the following fields.

name
String: the name of the service.
aliases
String Array: aliases for the service.
port
Int: the port number of the service.
proto
Protocol: the protocol for the service.

11.8.26  getservbyname

   $(getservbyname service...) : Service...
      service : String or Int
   raises RuntimeException

The getservbyname function gets the information for a network service. The service may be specified as a service name or number.

11.8.27  socket

   $(socket domain, type, protocol) : Channel
      domain : String
      type : String
      protocol : String
   raises RuntimeException

The socket function creates an unbound socket.

The possible values for the arguments are as follows.

The domain may have the following values.

PF_UNIX or unix
Unix domain, available only on Unix systems.
PF_INET or inet
Internet domain, IPv4.
PF_INET6 or inet6
Internet domain, IPv6.

The type may have the following values.

SOCK_STREAM or stream
Stream socket.
SOCK_DGRAM or dgram
Datagram socket.
SOCK_RAW or raw
Raw socket.
SOCK_SEQPACKET or seqpacket
Sequenced packets socket

The protocol is an Int or String that specifies a protocol in the protocols database.

11.8.28  bind

   bind(socket, host, port)
      socket : InOutChannel
      host : String
      port : Int
   bind(socket, file)
      socket : InOutChannel
      file : File
   raise RuntimeException

The bind function binds a socket to an address.

The 3-argument form specifies an Internet connection, the host specifies a host name or IP address, and the port is a port number.

The 2-argument form is for Unix sockets. The file specifies the filename for the address.

11.8.29  listen

   listen(socket, requests)
      socket : InOutChannel
      requests : Int
   raises RuntimeException

The listen function sets up the socket for receiving up to requests number of pending connection requests.

11.8.30  accept

   $(accept socket) : InOutChannel
      socket : InOutChannel
   raises RuntimeException

The accept function accepts a connection on a socket.

11.8.31  connect

    connect(socket, addr, port)
       socket : InOutChannel
       addr : String
       port : int
    connect(socket, name)
       socket : InOutChannel
       name : File
    raise RuntimeException

The connect function connects a socket to a remote address.

The 3-argument form specifies an Internet connection. The addr argument is the Internet address of the remote host, specified as a domain name or IP address. The port argument is the port number.

The 2-argument form is for Unix sockets. The name argument is the filename of the socket.

11.8.32  getchar

    $(getc) : String
    $(getc file) : String
       file : InChannel or File
    raises RuntimeException

The getc function returns the next character of a file. If the argument is not specified, stdin is used as input. If the end of file has been reached, the function returns false.

11.8.33  gets

   $(gets) : String
   $(gets channel) : String
      channel : InChannel or File
   raises RuntimeException

The gets function returns the next line from a file. The function returns the empty string if the end of file has been reached. The line terminator is removed.

11.8.34  fgets

   $(fgets) : String
   $(fgets channel) : String
      channel : InChannel or File
   raises RuntimeException

The fgets function returns the next line from a file that has been opened for reading with fopen. The function returns the empty string if the end of file has been reached. The returned string is returned as literal data. The line terminator is not removed.

11.9  Printing functions

Output is printed with the print and println functions. The println function adds a terminating newline to the value being printed, the print function does not.

    fprint(<file>, <string>)
    print(<string>)
    eprint(<string>)
    fprintln(<file>, <string>)
    println(<string>)
    eprintln(<string>)

The fprint functions print to a file that has been previously opened with fopen. The print functions print to the standard output channel, and the eprint functions print to the standard error channel.

11.10  Value printing functions

Values can be printed with the printv and printvln functions. The printvln function adds a terminating newline to the value being printed, the printv function does not.

    fprintv(<file>, <string>)
    printv(<string>)
    eprintv(<string>)
    fprintvln(<file>, <string>)
    printvln(<string>)
    eprintvln(<string>)

The fprintv functions print to a file that has been previously opened with fopen. The printv functions print to the standard output channel, and the eprintv functions print to the standard error channel.

11.10.1  Miscellaneous functions

11.10.1.1  set-channel-line

    set-channel-line(channel, filename, line)
        channel : Channel
        filename : File
        line : int

Set the line number information for the channel.

11.11  Higher-level IO functions

11.11.1  Regular expressions

Many of the higher-level functions use regular expressions. Regular expressions are defined by strings with syntax nearly identical to awk(1).

Strings may contain the following character constants.

  • \\ : a literal backslash.
  • \a : the alert character ^G.
  • \b : the backspace character ^H.
  • \f : the formfeed character ^L.
  • \n : the newline character ^J.
  • \r : the carriage return character ^M.
  • \t : the tab character ^I.
  • \v : the vertical tab character.
  • \xhh... : the character represented by the string of hexadecimal digits h. All valid hexadecimal digits following the sequence are considered to be part of the sequence.
  • \ddd : the character represented by 1, 2, or 3 octal digits.

Regular expressions are defined using the special characters .\^$[(){}*?+.

  • c : matches the literal character c if c is not a special character.
  • \c : matches the literal character c, even if c is a special character.
  • . : matches any character, including newline.
  • ^ : matches the beginning of a line.
  • $ : matches the end of line.
  • [abc...] : matches any of the characters abc...
  • [^abc...] : matches any character except abc...
  • r1|r2 : matches either r1 or r2.
  • r1r2 : matches r1 and then r2.
  • r+ : matches one or more occurrences of r.
  • r* : matches zero or more occurrences of r.
  • r? : matches zero or one occurrence of r.
  • (r) : parentheses are used for grouping; matches r.
  • \(r\) : also defines grouping, but the expression matched within the parentheses is available to the output processor through one of the variables $1, $2, ...
  • r{n} : matches exactly n occurrences of r.
  • r{n,} : matches n or more occurrences of r.
  • r{n,m} : matches at least n occurrences of r, and no more than m occurrences.
  • \y: matches the empty string at either the beginning or end of a word.
  • \B: matches the empty string within a word.
  • \<: matches the empty string at the beginning of a word.
  • \>: matches the empty string at the end of a word.
  • \w: matches any character in a word.
  • \W: matches any character that does not occur within a word.
  • \`: matches the empty string at the beginning of a file.
  • \': matches the empty string at the end of a file.

Character classes can be used to specify character sequences abstractly. Some of these sequences can change depending on your LOCALE.

  • [:alnum:] Alphanumeric characters.
  • [:alpha:] Alphabetic characters.
  • [:lower:] Lowercase alphabetic characters.
  • [:upper:] Uppercase alphabetic characters.
  • [:cntrl:] Control characters.
  • [:digit:] Numeric characters.
  • [:xdigit:] Numeric and hexadecimal characters.
  • [:graph:] Characters that are printable and visible.
  • [:print:] Characters that are printable, whether they are visible or not.
  • [:punct:] Punctuation characters.
  • [:blank:] Space or tab characters.
  • [:space:] Whitespace characters.

11.11.2  cat

    cat(files) : Sequence
       files : File or InChannel Sequence

The cat function concatenates the output from multiple files and returns it as a string.

11.11.3  grep

   grep(pattern) : String  # input from stdin, default options
      pattern : String
   grep(pattern, files) : String  # default options
      pattern : String
      files   : File Sequence
   grep(options, pattern, files) : String
     options : String
     pattern : String
     files   : File Sequence

The grep function searches for occurrences of a regular expression pattern in a set of files, and prints lines that match. This is like a highly-simplified version of grep(1).

The options are:

q
If specified, the output from grep is not displayed.
h
If specified, output lines will not include the filename (default, when only one input file is given).
n
If specified, output lines include the filename (default, when more than one input file is given).
v
If specified, search for lines without a match instead of lines with a match,

The pattern is a regular expression.

If successful (grep found a match), the function returns true. Otherwise, it returns false.

11.11.4  scan

   scan(input-files)
   case string1
      body1
   case string2
      body2
   ...
   default
      bodyd

The scan function provides input processing in command-line form. The function takes file/filename arguments. If called with no arguments, the input is taken from stdin. If arguments are provided, each specifies an InChannel, or the name of a file for input. Output is always to stdout.

The scan function operates by reading the input one line at a time, and processing it according to the following algorithm.

For each line, the record is first split into fields, and the fields are bound to the variables $1, $2, .... The variable $0 is defined to be the entire line, and $* is an array of all the field values. The $(NF) variable is defined to be the number of fields.

Next, a case expression is selected. If string_i matches the token $1, then body_i is evaluated. If the body ends in an export, the state is passed to the next clause. Otherwise the value is discarded.

For example, here is an scan function that acts as a simple command processor.

    calc() =
       i = 0
       scan(script.in)
       case print
          println($i)
       case inc
          i = $(add $i, 1)
          export
       case dec
          i = $(sub $i, 1)
          export
       case addconst
          i = $(add $i, $2)
          export
       default
          eprintln($"Unknown command: $1")

The scan function also supports several options.

    scan(options, files)
    ...
A
Parse each line as an argument list, where arguments may be quoted. For example, the following line has three words, “ls”, “-l”, “Program Files”.
       ls -l "Program Files"
   
O
Parse each line using white space as the separator, using the usual OMake algorithm for string parsing. This is the default.
x
Once each line is split, reduce each word using the hex representation. This is the usual hex representation used in URL specifiers, so the string “Program Files” may be alternately represented in the form ProgramProgram+Files.

Note, if you want to redirect the output to a file, the easiest way is to redefine the stdout variable. The stdout variable is scoped the same way as other variables, so this definition does not affect the meaning of stdout outside the calc function.

    calc() =
        stdout = $(fopen script.out, w)
        scan(script.in)
           ...
        close(stdout)

11.11.5  awk

   awk(input-files)
   case pattern1:
      body1
   case pattern2:
      body2
   ...
   default:
      bodyd

or

   awk(options, input-files)
   case pattern1:
      body1
   case pattern2:
      body2
   ...
   default:
      bodyd

The awk function provides input processing similar to awk(1), but more limited. The input-files argument is a sequence of values, each specifies an InChannel, or the name of a file for input. If called with no options and no file arguments, the input is taken from stdin. Output is always to stdout.

The variables RS and FS define record and field separators as regular expressions. The default value of RS is the regular expression \r|\n|\r\n. The default value of FS is the regular expression [ \t]+.

The awk function operates by reading the input one record at a time, and processing it according to the following algorithm.

For each line, the record is first split into fields using the field separator FS, and the fields are bound to the variables $1, $2, .... The variable $0 is defined to be the entire line, and $* is an array of all the field values. The $(NF) variable is defined to be the number of fields.

Next, the cases are evaluated in order. For each case, if the regular expression pattern_i matches the record $0, then body_i is evaluated. If the body ends in an export, the state is passed to the next clause. Otherwise the value is discarded. If the regular expression contains \(r\) expression, those expression override the fields $1, $2, ....

For example, here is an awk function to print the text between two delimiters \begin{<name>} and \end{<name>}, where the <name> must belong to a set passed as an argument to the filter function.

    filter(names) =
       print = false

       awk(Awk.in)
       case $"^\\end\{\([:alpha:]+\)\}"
          if $(mem $1, $(names))
             print = false
             export
          export
       default
          if $(print)
             println($0)
       case $"^\\begin\{\([:alpha:]+\)\}"
          print = $(mem $1, $(names))
          export

Note, if you want to redirect the output to a file, the easiest way is to redefine the stdout variable. The stdout variable is scoped the same way as other variables, so this definition does not affect the meaning of stdout outside the filter function.

    filter(names) =
        stdout = $(fopen file.out, w)
        awk(Awk.in)
           ...
        close(stdout)

Options.

b
“Break” when evaluating cases. Only the first case that matches will be selected.

The break function can be used to abort the loop, exiting the awk function immediately.

11.11.6  fsubst

   fsubst(files)
   case pattern1 [options]
      body1
   case pattern2 [options]
      body2
   ...
   default
      bodyd

The fsubst function provides a sed(1)-like substitution function. Similar to awk, if fsubst is called with no arguments, the input is taken from stdin. If arguments are provided, each specifies an InChannel, or the name of a file for input.

The RS variable defines a regular expression that determines a record separator, The default value of RS is the regular expression \r|\n|\r\n.

The fsubst function reads the file one record at a time.

For each record, the cases are evaluated in order. Each case defines a substitution from a substring matching the pattern to replacement text defined by the body.

Currently, there is only one option: g. If specified, each clause specifies a global replacement, and all instances of the pattern define a substitution. Otherwise, the substitution is applied only once.

Output can be redirected by redefining the stdout variable.

For example, the following program replaces all occurrences of an expression word. with its capitalized form.

    section
       stdout = $(fopen Subst.out, w)
       fsubst(Subst.in)
       case $"\<\([[:alnum:]]+\)\." g
          value $(capitalize $1).
       close(stdout)

11.11.7  lex

   lex(files)
   case pattern1
      body1
   case pattern2
      body2
   ...
   default
      bodyd

The lex function provides a simple lexical-style scanner function. The input is a sequence of files or channels. The cases specify regular expressions. Each time the input is read, the regular expression that matches the longest prefix of the input is selected, and the body is evaluated.

If two clauses both match the same input, the last one is selected for execution. The default case matches the regular expression .; you probably want to place it first in the pattern list.

If the body end with an export directive, the state is passed to the next clause.

For example, the following program collects all occurrences of alphanumeric words in an input file.

    collect-words($(files)) =
       words[] =
       lex($(files))
       default
          # empty
       case $"[[:alnum:]]+" g
          words[] += $0
          export

The default case, if one exists, matches single characters. Since

It is an error if the input does not match any of the regular expressions.

The break function can be used to abort the loop.

11.11.8  lex-search

   lex-search(files)
   case pattern1
      body1
   case pattern2
      body2
   ...
   default
      bodyd

The lex-search function is like the lex function, but input that does not match any of the regular expressions is skipped. If the clauses include a default case, then the default matches any skipped text.

For example, the following program collects all occurrences of alphanumeric words in an input file, skipping any other text.

    collect-words($(files)) =
       words[] =
       lex-search($(files))
       default
          eprintln(Skipped $0)
       case $"[[:alnum:]]+" g
          words[] += $0
          export

The default case, if one exists, matches single characters. Since

It is an error if the input does not match any of the regular expressions.

The break function can be used to abort the loop.

11.11.9  Lexer

The Lexer object defines a facility for lexical analysis, similar to the lex(1) and flex(1) programs.

In omake, lexical analyzers can be constructed dynamically by extending the Lexer class. A lexer definition consists of a set of directives specified with method calls, and set of clauses specified as rules.

For example, consider the following lexer definition, which is intended for lexical analysis of simple arithmetic expressions for a desktop calculator.

   lexer1. =
      extends $(Lexer)

      other: .
         eprintln(Illegal character: $* )
         lex()

      white: $"[[:space:]]+"
         lex()

      op: $"[-+*/()]"
         switch $*
         case +
            Token.unit($(loc), plus)
         case -
            Token.unit($(loc), minus)
         case *
            Token.unit($(loc), mul)
         case /
            Token.unit($(loc), div)
         case $"("
            Token.unit($(loc), lparen)
         case $")"
            Token.unit($(loc), rparen)

      number: $"[[:digit:]]+"
         Token.pair($(loc), exp, $(int $* ))

      eof: $"\'"
         Token.unit($(loc), eof)

This program defines an object lexer1 the extends the Lexer object, which defines lexing environment.

The remainder of the definition consists of a set of clauses, each with a method name before the colon; a regular expression after the colon; and in this case, a body. The body is optional, if it is not specified, the method with the given name should already exist in the lexer definition.

NB The clause that matches the longest prefix of the input is selected. If two clauses match the same input prefix, then the last one is selected. This is unlike most standard lexers, but makes more sense for extensible grammars.

The first clause matches any input that is not matched by the other clauses. In this case, an error message is printed for any unknown character, and the input is skipped. Note that this clause is selected only if no other clause matches.

The second clause is responsible for ignoring white space. If whitespace is found, it is ignored, and the lexer is called recursively.

The third clause is responsible for the arithmetic operators. It makes use of the Token object, which defines three fields: a loc field that represents the source location; a name; and a value.

The lexer defines the loc variable to be the location of the current lexeme in each of the method bodies, so we can use that value to create the tokens.

The Token.unit($(loc), name) method constructs a new Token object with the given name, and a default value.

The number clause matches nonnegative integer constants. The Token.pair($(loc), name, value) constructs a token with the given name and value.

Lexer object operate on InChannel objects. The method lexer1.lex-channel(channel) reads the next token from the channel argument.

11.11.10  Lexer matching

During lexical analysis, clauses are selected by longest match. That is, the clause that matches the longest sequence of input characters is chosen for evaluation. If no clause matches, the lexer raises a RuntimeException. If more than one clause matches the same amount of input, the first one is chosen for evaluation.

11.11.11  Extending lexer definitions

Suppose we wish to augment the lexer example so that it ignores comments. We will define comments as any text that begins with the string (*, ends with *), and comments may be nested.

One convenient way to do this is to define a separate lexer just to skip comments.

   lex-comment. =
      extends $(Lexer)

      level = 0

      other: .
         lex()

      term: $"[*][)]"
         if $(not $(eq $(level), 0))
            level = $(sub $(level), 1)
            lex()

      next: $"[(][*]"
         level = $(add $(level), 1)
         lex()

      eof: $"\'"
         eprintln(Unterminated comment)

This lexer contains a field level that keeps track of the nesting level. On encountering a (* string, it increments the level, and for *), it decrements the level if nonzero, and continues.

Next, we need to modify our previous lexer to skip comments. We can do this by extending the lexer object lexer1 that we just created.

   lexer1. +=
      comment: $"[(][*]"
         lex-comment.lex-channel($(channel))
         lex()

The body for the comment clause calls the lex-comment lexer when a comment is encountered, and continues lexing when that lexer returns.

11.11.12  Threading the lexer object

Clause bodies may also end with an export directive. In this case the lexer object itself is used as the returned token. If used with the Parser object below, the lexer should define the loc, name and value fields in each export clause. Each time the Parser calls the lexer, it calls it with the lexer returned from the previous lex invocation.

11.11.13  Parser

The Parser object provides a facility for syntactic analysis based on context-free grammars.

Parser objects are specified as a sequence of directives, specified with method calls; and productions, specified as rules.

For example, let's finish building the desktop calculator started in the Lexer example.

   parser1. =
      extends $(Parser)

      #
      # Use the main lexer
      #
      lexer = $(lexer1)

      #
      # Precedences, in ascending order
      #
      left(plus minus)
      left(mul div)
      right(uminus)

      #
      # A program
      #
      start(prog)

      prog: exp eof
         return $1

      #
      # Simple arithmetic expressions
      #
      exp: minus exp :prec: uminus
         neg($2)

      exp: exp plus exp
         add($1, $3)

      exp: exp minus exp
         sub($1, $3)

      exp: exp mul exp
         mul($1, $3)

      exp: exp div exp
         div($1, $3)

      exp: lparen exp rparen
         return $2

Parsers are defined as extensions of the Parser class. A Parser object must have a lexer field. The lexer is not required to be a Lexer object, but it must provide a lexer.lex() method that returns a token object with name and value fields. For this example, we use the lexer1 object that we defined previously.

The next step is to define precedences for the terminal symbols. The precedences are defined with the left, right, and nonassoc methods in order of increasing precedence.

The grammar must have at least one start symbol, declared with the start method.

Next, the productions in the grammar are listed as rules. The name of the production is listed before the colon, and a sequence of variables is listed to the right of the colon. The body is a semantic action to be evaluated when the production is recognized as part of the input.

In this example, these are the productions for the arithmetic expressions recognized by the desktop calculator. The semantic action performs the calculation. The variables $1, $2, ... correspond to the values associated with each of the variables on the right-hand-side of the production.

11.11.14  Calling the parser

The parser is called with the $(parser1.parse-channel start, channel) or $(parser1.parse-file start, file) functions. The start argument is the start symbol, and the channel or file is the input to the parser.

11.11.15  Parsing control

The parser generator generates a pushdown automation based on LALR(1) tables. As usual, if the grammar is ambiguous, this may generate shift/reduce or reduce/reduce conflicts. These conflicts are printed to standard output when the automaton is generated.

By default, the automaton is not constructed until the parser is first used.

The build(debug) method forces the construction of the automaton. While not required, it is wise to finish each complete parser with a call to the build(debug) method. If the debug variable is set, this also prints with parser table together with any conflicts.

The loc variable is defined within action bodies, and represents the input range for all tokens on the right-hand-side of the production.

11.11.16  Extending parsers

Parsers may also be extended by inheritance. For example, let's extend the grammar so that it also recognizes the << and >> shift operations.

First, we extend the lexer so that it recognizes these tokens. This time, we choose to leave lexer1 intact, instead of using the += operator.

   lexer2. =
      extends $(lexer1)

      lsl: $"<<"
         Token.unit($(loc), lsl)

      asr: $">>"
         Token.unit($(loc), asr)

Next, we extend the parser to handle these new operators. We intend that the bitwise operators have lower precedence than the other arithmetic operators. The two-argument form of the left method accomplishes this.

   parser2. =
      extends $(parser1)

      left(plus, lsl lsr asr)

      lexer = $(lexer2)

      exp: exp lsl exp
         lsl($1, $3)

      exp: exp asr exp
         asr($1, $3)

In this case, we use the new lexer lexer2, and we add productions for the new shift operations.

11.11.17  Passwd

The Passwd object represents an entry in the system's user database. It contains the following fields.

pw_name: the login name.
pw_passwd: the encrypted password.
pw_uid: user id of the user.
pw_gid: group id of the user.
pw_gecos: the user name or comment field.
pw_dir: the user's home directory.
pw_shell: the user's default shell.

Not all the fields will have meaning on all operating systems.

11.11.18  getpwnam, getpwuid

    $(getpwnam name...) : Passwd
       name : String
    $(getpwuid uid...) : Passwd
       uid : Int
    raises RuntimeException

The getpwnam function looks up an entry by the user's login and the getpwuid function looks up an entry by user's numerical id (uid). If no entry is found, an exception will be raised.

11.11.19  getpwents

    $(getpwents) : Array

The getpwents function returns an array of Passwd objects, one for every user fund in the system user database. Note that depending on the operating system and on the setup of the user database, the returned array may be incomplete or even empty.

11.11.20  Group

The Group object represents an entry in the system's user group database. It contains the following fields.

gr_name: the group name.
gr_group: the encrypted password.
gr_gid: group id of the group.
gr_mem: the group member's user names.

Not all the fields will have meaning on all operating systems.

11.11.21  getgrnam, getgrgid

    $(getgrnam name...) : Group
       name : String
    $(getgrgid gid...) : Group
       gid : Int
    raises RuntimeException

The getgrnam function looks up a group entry by the group's name and the getgrgid function looks up an entry by groups's numerical id (gid). If no entry is found, an exception will be raised.

11.11.22  tgetstr

   $(tgetstr id) : String
      id : String

The tgetstr function looks up the terminal capability with the indicated id. This assumes the terminfo to lookup is given in the TERM environment variable. This function returns an empty value if the given terminal capability is not defined.

Note: if you intend to use the value returned by tgetstr inside the shell prompt, you need to wrap it using the prompt-invisible function.

11.11.23  xterm-escape-begin, xterm-escape-end

   $(xterm-escape-begin) : String
   $(xterm-escape-end) : String

The xterm-escape-begin and xterm-escape-end functions return the escape sequences that can be used to set the XTerm window title. Will return empty values if this capability is not available.

Note: if you intend to use these strings inside the shell prompt, you need to use $(prompt_invisible_begin)$(xterm-escape-begin) and $(xterm-escape-end)$(prompt_invisible_end).

11.11.24  xterm-escape

   $(xterm-escape s) : Sequence

When the TERM environment variable indicates that the XTerm title setting capability is available, $(xterm-escape s) is equivalent to $(xterm-escape-begin)s$(xterm-escape-end). Otherwise, it returns an empty value.

Note: if you intend to use the value returned by xterm-escape inside the shell prompt, you need to wrap it using the prompt-invisible function.

11.11.25  prompt-invisible-begin, prompt-invisible-end

   $(prompt-invisible-begin) : String
   $(prompt-invisible-end) : String

The prompt-invisible-begin and prompt-invisible-end functions return the escape sequences that must used to mark the “invisible” sections of the shell prompt (such as various escape sequences).

11.11.26  prompt-invisible

   $(prompt-invisible s) : Sequence

The prompt-invisible will wrap its argument with $(prompt-invisible-begin) and $(prompt-invisible-end). All the `invisible” sections of the shell prompt (such as various escape sequences) must be wrapped this way.

11.11.27  gettimeofday

   $(gettimeofday) : Float

The gettimeofday function returns the time of day in seconds since January 1, 1970.

Jump to:  OMake Home • Guide Home • Guide (single-page) • Contents (short) • Contents (long)
Index:  All • Variables • Functions • Objects • Targets • Options
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml