IBM Support

RPG Cafe, Fall 2022: CHARCOUNT-NATURAL mode

News


Abstract

RPG enhancement delivered through PTFs in the fall of 2022. CHARCOUNT NATURAL mode to handle data correctly that has different-sized characters, such as UTF-8

Content

You are in: RPG Cafe > RPG Cafe: Fall 2022: CHARCOUNT-NATURAL mode

Short URL: https://ibm.biz/rpgcafe_fall_2022_charcount

Fall 2022: CHARCOUNT NATURAL mode to handle data correctly that has different-sized characters, such as UTF-8

Warning

If you use the CHARCOUNT NATURAL support in your program, you need a runtime PTF on any system where you run the program. If the runtime PTF is not applied on the system where the program is running, the call to the program fails with MCH4437 saying that a program export is not found.

Details

Options to process strings by natural characters instead of bytes or double bytes

  • UTF-8 data can have characters with 1, 2, 3, or 4 bytes. 2-byte characters are common.
  • UTF-16 data can have characters with 2 or 4 bytes.
  • Mixed EBCDIC data can have characters with 1 or 2 bytes, and it contains "shift" characters to indicate the change between SBCS and DBCS data.
  • Mixed ASCII data can have characters with 1 or 2 bytes.

With the default standard-character-size mode (STDCHARSIZE), data is handled by bytes or double-bytes, without regard for the length of individual characters in the data.

With natural mode:

  • Most string built-in functions such as %SUBST, %SCAN, %XLATE, %SPLIT operate by characters rather than by bytes or double bytes. However, %LEN always works with the number of bytes or double bytes and %STR always works with the number of bytes. New built-in function %CHARCOUNT returns the number of characters in a string.
  • If data is truncated during assignment or parameter-passing, the truncation removes complete characters.
  • If data is truncated during I/O operations, the truncation removes complete characters.

There are several options to enable or disable CHARCOUNT NATURAL mode.

  • Specify Control keyword CHARCOUNT to set the initial CHARCOUNT mode.
  • Specify Control keyword CHARCOUNTTYPES to list the data types that you want to be handled in CHARCOUNT NATURAL mode.
  • Specify directive /CHARCOUNT NATURAL to set the mode to CHARCOUNT(*NATURAL) for statements following the directive.
  • Specify directive /CHARCOUNT STDCHARSIZE to set the mode to CHARCOUNT(*STDCHARSIZE) for statements following the directive.
  • Specify *NATURAL as the last parameter of a built-in function to enable CHARCOUNT NATURAL mode for the built-in function even if the data type of the operands is not relevant according to the CHARCOUNTTYPES keyword. Specify *STDCHARSIZE as the last parameter of a built-in function to disable CHARCOUNT NATURAL mode for the built-in function.
  • Specify File keyword CHARCOUNT(*NATURAL) to set CHARCOUNT NATURAL mode when data is moved from RPG variables to the output buffer and key buffer for the file. Specify File keyword CHARCOUNT(*STDCHARSIZE) to disable CHARCOUNT NATURAL mode when data is moved from RPG variables to the output buffer and key buffer for the file. The CHARCOUNT mode for a file defaults to the current CHARCOUNT mode for the definition statement for the file.

This example shows how you can operate in standard-character-size mode in general, and operate in natural mode only when you know the UTF-8 data might have multi-byte characters.

           dcl-s string varchar(20) ccsid(*utf8);
           dcl-s n int(10);
           dcl-s string2 varchar(20);

           string = 'ábcdë';

           n = %len(string);
           // n = 7 (á and ë are 2-byte characters)
           n = %charcount(string);
           // n = 5

           string2 = %subst(string : 1 : 3);
           // string2 = "áb"
           string2 = %subst(string : 1 : 3 : *natural);
           // string2 = "ábc"

This example shows how you can operate in natural mode by default for UTF-8 data. The example shows how you can switch back to standard-character-size mode, but in general, you would switch to standard-character-size when you know you are working with data that has only 1-byte characters.

           ctl-opt charcounttypes(*utf8);
           ctl-opt charcount(*natural);

           dcl-s string varchar(20) ccsid(*utf8);
           dcl-s n int(10);
           dcl-s string2 varchar(20);

           string = 'ábcdë';

           n = %len(string);
           // n = 7 (á and ë are 2-byte characters)
           n = %charcount(string);
           // n = 5

           string2 = %subst(string : 1 : 3);
           // string2 = "ábc"
           string2 = %subst(string : 1 : 3 : *stdcharsize);
           // string2 = "áb"

           /charcount stdcharsize

           string2 = %subst(string : 1 : 3);
           // string2 = "áb"
           string2 = %subst(string : 1 : 3 : *natural);
           // string2 = "ábc"
.                                                            .
.                                                            .

PTFs for 7.4, and 7.5, available in December 2022

7.4:

  • ILE RPG compiler: 5770WDS SI81749
  • ILE RPG runtime: 5770WDS SI81729

7.5:

  • ILE RPG compiler: 5770WDS SI81801
  • ILE RPG compiler, TGTRLS(V7R4M0): 5770WDS SI81819
  • ILE RPG runtime: 5770WDS SI81740

The PTFs are also available with Db2 for i Fix Packs. See Db2 for IBM i 2022 PTF Group Schedule.

RDi support

A later update for RDi will support these enhancements.

Documentation

The 7.3, 7.4, and 7.5 ILE RPG Reference and ILE RPG Programmer's Guide are updated with full information about these enhancements. Start at the "What's new since 7.3", "What's new since 7.4", or "What's new since 7.5" section in the Reference.

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS69QP","label":"Rational Development Studio for i"},"ARM Category":[{"code":"a8m0z0000000C4BAAU","label":"IBM i"},{"code":"a8m0z0000000CHtAAM","label":"Programming ILE Languages"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"7.4.0;and future releases"}]

Document Information

Modified date:
01 December 2022

UID

ibm16827067