11.37 regex()

With this function a Regular Expression can be applied to a string or a list of strings.

In order to provide a maximum of functionality, the IDM dynamically integrates the free PCRE library. This enables pattern expressions analog to PERL. If you use this function in your product, you should pay attention to the license terms and documentation of PCRE (www.pcre.org). With regard to linking, chapter “PCRE Library for Support of Regular Expressions” should be observed. ISA recommends a PCRE library version 8.* for proper working. The PCRE functions are used for pattern matching and capturing of string parts. The IDM then handles evaluation and replacement.

Syntactically the following Regular Expressions are accepted by the IDM to enable the operations matching and substitution:

  • matching

    m/<pattern>/<modifiers>
  • substitution

    s/<pattern>/<replacement>/<modifiers>

The following <modifiers> are supported for configuring the operation:

s

single-line string  (PCRE_DOTALL)

m

multi-line (PCRE_MULTILINE)

i

ignore case (PCRE_CASELESS)

g

global  search – pattern matching is repeated

x

extended (PCRE_EXTENDED)

f

first line (PCRE_FIRSTLINE)

W

unicode (wide) character classes (PCRE_UCP)

X

extra (PCRE_EXTRA)

U

ungreedy (PCRE_UNGREEDY)

The parentheses indicate how the modifiers are passed to PCRE. This means that they are not processed by the IDM, but by the PCRE library.

The modifiers o and e are skipped without error message.

The actual application of <pattern> and <replacement> then happens through the PCRE library. More details and information on the Regular Expressions may therefore be found in the documentation of the PCRE library used.

The IDM also allows to split the parts <pattern> and <replacement> of the Regular Expression into two separate parameters, but in this case no <modifiers> are possible. Usually, the operation determines the type of results returned. However, this can be controlled by an Action parameter, e.g. to return only the number of matches found, to achieve filtering or to get the values of the <pattern> variables.

These are the available actions with their corresponding evaluations:

Table 21-2: Actions, return types and evaluations of the regex function

Action

Return Type

Evaluation

regex_eval

boolean
string
list[string]

Depending on the operation it is either returned whether at least one of the strings matches the Regular Expression (matching operation true or false).

Or in case of substitution the replaced string(s) will be returned, if the pattern matches, otherwise the original string.

regex_match

boolean
list[string]

Performs a check of the pattern only and returns true if the pattern matches, otherwise false. For a string list, only strings that match the pattern will be included in the result list.

regex_unmatch

boolean
list[string]

Performs a check of the pattern only and returns false if the pattern matches, otherwise true. For a string list, only strings that do not match the pattern will be included in the result list.

regex_count

integer

Counts the number of matches found in the string or string list. For each string, a maximum of +1 is counted, so that the result is 0 or 1 when applied to a single string and a value in the range 0 … itemcount() when applied to a string list.

regex_locate

integer
list[integer]

Returns the character position of the first match when applied to a single string. For a string list, the index positions in the list where the pattern matches are returned.

The IDM supports application of the regex function on any collection data type (list, hash, matrix, vector). However, a generated return list is always of the type list without a special indexing of the source list beiing adopted. Before the Regular Expression is applied, values that are not of the data type string are converted into a string like it happens with print <Value>; for instance.

Definition

anyvalue regex
(
      anyvalue StringOrList input,
      string   Pattern input
  { , string   Replace input }
  { , enum     Action  := regex_eval input }
)

Parameters

anyvalue StringOrList input
This parameter contains the string or the list of strings that the Regular Expression is applied on.
string Pattern input
This parameter contains either the Regular Expression or the pattern string (<pattern>) if the regular expression is split into <pattern> and <replacement>.
string Replace input
This optional parameter should hold the replacement string (<replacement>) if the Regular Expression is split into <pattern> and <replacement> for the function call.
enum Action := regex_eval input

This optional parameter controls the results evaluation. These are the available actions:

regex_eval (default)

Evaluation of the Regular Expression (matching or substituition).

regex_match

Filtering for strings that match the pattern.

regex_unmatch

Filtering for strings that do not match the pattern.

regex_count

Number of matches found (+1 for each search string).

regex_vars

Returns the contents of all variables defined by the search pattern.

regex_locate

Returns the character position or index position of the match found.

Return value

Return value and type depend on the evaluation action and are explained in “Table 21-2” (above).

Examples

  1. Test for digits in a string

    print regex("Is 127 a number?", "\\d+");

    Output

    true
  2. Replace all decimal numbers with an N

    print regex("42 is greater than 10", "s/(\d+)/N/g");

    Output

    "N is greater than N"
  3. Output only strings that match the pattern

    print regex("127,5", "^\\d+,\\d+$", regex_match);
    print regex("1275", "^\\d+,\\d+$", regex_match);
    print regex(["3,7", "17,5", "0", "21,03"], "^\\d+,\\d+$", regex_match);

    Output

    "127,5"
    ""
    ["3,7","17,5","21,03"]
  4. Applying multiple Regular Expressions on a list

    • list only values that contain a word
    • count the number of words
    • surround each item with >> <<
    • list all birth years
    variable list BirthDays := ["12-13-1973", "Amy", "1-7-1965", "Tom"];
    print regex(BirthDays, "^\\w+$", regex_match);
    print regex(BirthDays, "^\\w+$", regex_count);
    print regex(BirthDays, "s/(.*)/>> $1 <</", regex_match);
    print regex(BirthDays, "s/\\d+-\\d+-(\\d+)/$1/", regex_match);

    Output

    ["Amy","Tom"]
    2
    [">> 12-13-1973 <<",">> Amy <<",">> 1-7-1965 <<",">> Tom <<"]
    ["1973","1965"]
  5. List the variable values contained in a Regular Expression

    print regex("+2500 dollars or more", "(\\d+)\\s+(\\w+)", regex_vars);

    Output

    ["2500 dollars","2500","dollars"]
  6. List the access indexes for the found matches in a list or string

    variable list Locales := [ "de_DE.UTF8", "C.UTF-8", "de_AT.utf8",
                               "en_AU.utf8", "en_ZM", "POSIX", "de_CH.uf8" ];
    print regex(Locales, "/^de_/", regex_locate);
    print regex("Hello World", "/W/", regex_locate);

    Output

    [1,3,7]
    7
  7. Utilizing automatic conversion of values in a list into string values

    record Rec4711 {} 
    print regex([123, winsys_x11, "Bond 007", opt_w2kprefsize_compat, Rec4711],
                "s/\\d+/N/g");

    Output

    ["N","winsys_xN","Bond N","opt_wNkprefsize_compat","RecN"]

Availability

Since IDM version A.06.02.g

PCRE Library for Support of Regular Expressions

To use Regular Expressions through the built-in function regex() or as a format in IDM, the free library PCRE (Perl Compatible Regular Expression, see also www.pcre.org) is required. Therefore, when using this feature in a product, the license terms of PCRE should be respected.

The IDM needs a PCRE library version 3 or higher with enabled Unicode support and the standard PCRE interface. The PCRE2 interface introduced with PCRE version 10 is not yet supported. The latest stable version 8.* of the PCRE library is recommended. Typically, most current Linux distributions are already equipped with the PCRE library by default or provide a trouble-free later installation. For the use on Windows, apart from compiling the library on your own, it may also be convenient to download a precompiled library, e.g. from www.pcre.org or www.airesoft.co.uk.

Important

Depending on the version, a varying feature set and error status of the PCRE library is always to be expected. Please note that ISA cannot give any warranty for the PCRE library and its functions.

The PCRE library is usually linked dynamically by searching for the functions pcre_compile, pcre_study, pcre_exec, pcre_version and pcre_free. The IDM passes strings in UTF8 encoding, hence the linked PCRE library should also have UTF8 support.

The following linking types and associated search orders are permitted by the IDM:

Table 21-3: Linking types and search orders for the PCRE library

Linking Type

Windows

Unix/Linux1

E

Function search directly in the executable

 

 

A

Application-oriented

(relative to the path of the application)

pcre3.dll

dll\pcre3.dll

pcre.dll

dll\pcre.dll

pcre.(so|sl)

lib/pcre.(so|sl)

../lib/pcre.(so|sl)2

S

System-specific library search

(e.g. using the path variables PATH or LD_LIBRARY_PATH)

pcre3.dll

pcre.dll

pcre.(so|sl)

When building an application with the IDM libraries, it is attempted to link in the order E – A – S. Thus, the easiest way to provide your own IDM application with Regular Expression support is to place the dynamic PCRE library next to the executable. Otherwise, the library existing in the system will be used.

The IDM applications supplied by ISA for development and simulation (IDM, RIDM*, IDMED and Debugger) already have the PCRE library built in statically and use the search A – E – S for binding, so that the use of an external PCRE library is possible as well.

If a static linking is also wanted for your own IDM application, the following should be noted: If the application is linked without referencing the PCRE functions, it must be pulled in completely (typical linker options are e.g. --whole-archive, +forceload or /opt:notref) and it has to be ensured that the PCRE functions are found by the system-specific function pointer search (this may require to export the functions of the application). However, ISA recommends linking via an external library (DLL, Shared Library), in order to facilitate an exchange of the PCRE version in your own product distribution.

The order for linking the PCRE library can be controlled by the application programmer through the interface function DM_Control or DM_ControlEx with the action DMF_PCREBinding.