CDS.headStuff2 acut

acut (1)     Gets parts of a file     (Jul-2016)

acut [–Ccomment] [–dc] [–el] [–T s1 s2] [–acols|–ccols|–fcols]... [–itext]... [–map mapping_instructions] [file...]

The acut command allows to extract and reformat columns of a file, and to insert fixed text. Unlike cut(1), the order of the columns is arbitrary, and several cut/insert arguments can be specified. The columns can be raw characters (–c) or intervals delimited by a delimiter character (–f). From version 4.4, acut also accept a mapping string, i.e. a mixture of plain test and excerts from input text.

acut also allows to translate (replace characters) in specified columns (T), and to remove leading and/or trailing blanks in specified columns (h, j and t), or to perform regular expression substitution in the specified columns.

acut is especially useful for reformatting decimal or sexagesimal numbers which can be properly aligned (see below). Accepted sexagesimal representations include blank- or column-separated numbers (like 12:04:56 or 12 04 56), as well as number without separator (like 120445)


–version    prints the version and exits.

–Ccomment    defines comment lines – lines starting by comment are just copied to the output without reformatting. For instance, –C'#' will consider lines starting by a has sign as comments. By default, blank lines are considered by comments.

–dc    defines the character used as a delimiter between columns; this delimiter is only used in cunjunction with the -f option. Its default is the tab.

–el    asks to process the empty lines. By default the empty lines are directly issued, assuming these have just a role of delimiter. (added in V3.92)

–T s1 s2    defines the translation table: characters from s1 are converted into the corresponding characters of s2 (byte-by-byte translation). Strings s1 and s2 must have the same length. The complete alphabet may be specified by \a (lowercase) or \A (uppercase); \s stands for the 6 spaces (blank, tab, vtab, newline, carriage-return, form-feed) Non-printable characters can be specified using the C conventions (e.g. \n for the newline, or \e for the Escape, or the octal representation).

As a shortcut, acut accepts the options –Tlow and –Tup for conversion to lowercase / uppercase.

–acols    asks to insert a delimiter (the character defined by the –dc option) between character columns; for example, acut -a1-15,16- inserts a tab between bytes 15 and 16.
Note that only one –a option can be specified.

–ccols    defines a set of character columns. cols expressions can make use of commas and dashes, e.g. -c1,80,2-5 asks to pick successively the columns 1, then 80, then the range 2 to 5. cols may be empty, meaning all consecutive columns between the previous –c and next –c definitions. For details on how cols can be written, refer to the cols section.

–fcols    is similar to the –c option, but columns are defined as the set of bytes between two delimiter characters.

–itext    to insert the fixed text following the –i into each line of the output. The –i alone asks to insert the delimiter (specified by the –d option, tab by default)

–imap mapping_expression    to combine in a single argument a mixture of text (–i) and column exceprts (–f or –c, depending whether a delimiter character was defined via the –d option). In this mode, the $ (or @) is used to specify excerpts, either as $n (where n represents a number) for the column number n ($1 is the leftmost columns, while $0 represents the whole line), or as a cols expression within curly braces like ${expr}.

cols expressions
cols expressions allow to define a set of positions (starting from 1) representing either byte positions (–c) or column numbers (–f). Numbers may be separated by commas, e.g. -c1,5 for byte numbers 1 and 5, or by a dash, like -c5-10 for bytes 5 through 10 (i.e. 6 consecutive bytes), or by a plus like -c5+6 for 6 bytes starting at byte #5 (i.e. bytes 5 through 10).

There is however a difference between -f5-10 and -f5+6: the former (-f5-10 consists of the 5 columns separated by 4 delimiters (the character defined by the –d option), while the latter -f5+6 has no delimiter between the 5 columns (the 5 columns are concatenated). An alternative way of concatenating the character-separated columns is to start the list of fields with a (minus), e.g. -f5+6 and -f-2-10 are equivalent.

A missing number after the dash means up to end of line, and a missing number before the dash means from byte following the last defined column. A repetition factor can be added, e.g. -c5+6*4 to specify the 4 fields laying over bytes 5–10, 11–16, 17–22 and 23–28.

cols may also be

  • –c meaning all consecutive columns between the previous and the next cols definition
  • –c.. meaning all remaining columns, i.e. columns not mentioned in previous arguments.

cols expressions may end with the following reformatting options:

  • [%] (format %[+][0]n[:|.]d[opt] asks to align the numeric value, possibly expressed in sexagesimal, along a decimal point (.), assuming up to d decimals or trailing spaces reserved for comments like uncertainty flags; a sign can be inserted (+), and the number may be zero-filled (0). See below the section on Numbers for details
  • [s]regular_expression_substitution    asks to perform a substitution in the specified set of bytes in a way similar to sed(1) See below the section on pattern substitution for details.
  • [a]value    asks to add to the column an (integer) value specified.
  • [+] = (format +n[a|A|d|x|o])    asks to edit a counter of width n when the field defined is identical to same field in the preceding line (see Counter section below)
  • [b] asks to blank the cols if their contents is identical to the previous line. This option is only available for –c column specifications.
  • [B] asks to replace blank columns by the contents of the previous record. This option is only available for –c column specifications.
  • [h] asks to remove the head blanks; it is identical to %– (left-aligned format)
  • [j] asks to justify blanks; it has the same effect as ht
  • [l] asks to adjust to the left, and is identical to h
  • [n] indicates a numeric field (see below the section on Numbers)
  • [r] asks to adjust to the right, and is identical to t
  • [q] asks to unquote the column (remove leading/ending quote ", replace adjacents quotes by a single one — following the SQL conventions)
  • [t] asks to remove the tail blanks;
  • [T] asks to translate the cols according to the translation table specified by the –T option. The –T option must of course precede the cols definitions.

For instance, translating the first column in uppercase and removing leading & trailing blanks in the other columns can be written as

-T a-z A-Z -f1T -f2-j

Numbers (%)
With the %[+][0]n[:|.]d[opt] option, the field is assumed to contain a decimal number with up to d decimals. This option performs also realignments of possible prefix (typically a limit flag) and of possible suffix (typically an uncertainty flag like a colon).

In this option, the following transformations are performed:

  • the sign can be systematically inserted at the very left whith the %+ option.
  • the decimal point is aligned in column (n–d) with the %n.d option, or the rightmost digit is aligned in column (n–d) with the %n:d option; for sexagesimal representations, the decimal point must pertain to the seconds part.
  • the blanks between the sign and the digits are removed, or are filled with zeroes when the n number starts by 0
  • the prefix, if any, is left flushed,
  • the suffix, if any, is right flushed.
  • 1-letter options mean the following:
    • s: the number is in sexagesimal representation, either time or angle (h m s, or ° ' '').
    • S: the number is in sexagesimal representation, and the decimal fractions of the degree/hour or minutes have to be changed to match a full sexagesimal representation, i.e. 23 12.5 will be changed into 23 12 30
    • r: the number will be rounded to the specified number of decimals if more than d decimals are supplied in the input.
    • r!: in addition to r, apply limits to the value, as 9-filled values (-9... for the minimum, 99... for the maximum)
    • q: add a lower-case quality to the number: a when the number has effectively d decimals, b when d–1 decimals are supplied, etc. For sexagesimal representations, one sexagesimal part is equivalent to 2 digits.
    • Q: upper-case quality (see q)
    • <num: limit the number to some predefined boundary, i.e. for a value of the field larger than num, replace its contents by num (see also the r! option)
    The options may be combined, e.g. %12.2sr to round a sexagesimal value.
With a ``format'' %10.3, the following lines show the transformations on a few numbers, (the first column contain initial data, the second the transformed numbers)
0123456789  0123456789
 -  12         -12.       
12.2:           12.2 :
 <10?       <   10.  ?
When the alignment is not possible or a truncation occured, a warning message is issued.

Pattern Substitution (s)
In a way similar to sed(1), the column defined can be substituted. For instance, removing all blanks in the column between bytes 1 and 12 can be written:

acut -c1-12s'/ //g' -c13-

or squeezing the blanks (leaving just one when there are several)

acut -c1-12s'/  */ /g' -c13-

The substitution can use the \1 ... patterns matched as in sed: for instance interverting the texts separated by a comma between columns 1 and 12 can be written

acut -c1-12s'^/\([^,]*\),\(.*\)/\2,\1/'

It is possible to ask to write a counter which is increased each time two consecutive lines have an identical column, and reset to 1 when the column differs from the previous line. This option may also follow the B cols expression, as in the example
which asks to replace a blank field in bytes 1-11 by a repetition of the contents from previous line, followed by a for the first (non-blank) line, b for the next (blank) line, etc. However, the counter is inserted only when at least two consecutive lines have an identical column.

The complete option is +n[a|A|d|x|o] where:

  • the number n is the width of the extra column with the edited counter; the default value is 1
  • the letter specifies how to edit the counter:
    • [a] for lowercase alphabetic
    • [A] for uppercase alphabetic
    • [d] for decimal
    • [o] for octal
    • [x] for hexadecimal lowercase
    • [X] for hexadecimal uppercase

The –c and –f options are mutually exclusive.

Returned Status
The acut command returns 0 in case of success, and non-zero when bad options or unreadable files are found.

  1. To generate a tab-separated-table from an ascii file, column 1 from bytes 1 to 10, column 2 from bytes 11 to 20, and the rest as column 3, leading/trailing blanks removed in each column:
    acut -a1-10,11-20,21-j

  2. To rewrite the degree part of the declination in bytes 10-12:
    acut -c1-9 -c10-12%+03 -c13-

  3. To generate a file containing on each line: the column 1, then column 80, then the columns 11 through 20 which are assumed to contain a (possibly badly aligned) number with up to 3 decimals, and each field preceded with an obvious explanation, then all other columns:
    acut -i"Col.1=" -c1 -i" Col.80=" -c80 -i" Cols11-20: " -c11-20%10.3 -i/ -c..
    The same result could be obtained with:
    acut -map 'Col.1=$1 Col.80=$80 Cols11-20: ${11-20%10.3}/$..
  4. To remove all blanks in the columns 10 to 20, leaving other columns intact: acut -c -c10-20s'/ //g' -c

See also
cut(1)   sed(1)