MKEYED Files - Using With String Templates

Description

MKEYED files are record oriented files that provide the developer with a method for fast retrieval of records indexed by up to 16 different keys.

There are two types of MKEYED files: single-keyed and multi-keyed. It is important to understand the similarities and the differences between the two types of MKEYED files, as well as how they are handled when accessed via SQL statements within the language or by the ODBC/JDBC drivers (from now on referred to as the "SQL engine").

See MKEYED Files for additional information about MKEYED files.

Records

A record in an MKEYED file, and all other BASIS record oriented files, is a fixed size array of bytes. Each record in an MKEYED file is the same length. This length set at the time of file creation and is unchangeable. If there is ever a need to change the record size, create a new file with the new record size, and then copy all the records from the original file into the new file.

There is no physical connection between records and string templates. For example, if a string template has variable length fields, the record will still be the size defined at creation time. In addition, if using string templates to create and interpret record data, any change to the template definition also requires recreating and rewriting each record in the file forming all records properly. However, if all fields in the template are variable length, and the only change made to the template is in the maximum lengths of the fields, then all that is needed is a change in the record size as described above.

Keys

A key is a reference value used to lookup a particular record or list of records quickly. As with records, keys do not have any physical relationship to string templates. There can be up to 16 keys on each multi-keyed MKEYED file (only one key on single-keyed MKEYED files). Iteration over keyed files occurs on a single key chain at a time. Iteration on single-keyed files occurs on the only key available while iteration is possible on multi-keyed files on any one of its keys. Key values are stored in sorted order (this is the key chain), either ascending or descending. When an application reads a record, it repositions the file pointer to the next key (or previous key if the DIR= option specifies a negative number) in the key chain. The application reads the record corresponding to the next key value while performing another read.

When an application needs to access data in a file, it usually needs a single record, or set of records that meet some kind of specific criteria. Keys enable a developer (or the SQL engine) to jump to a particular position in the file and read only those records that meet the criteria, or at least limit the number of records read. Without keys, all records in a file would need to be read and evaluated to determine if they are actually necessary. This is an important concept to understand, especially when using the SQL engine. If a file does not have proper or adequate keys (or indices in SQL terminology), optimization with regard to retrieving records is not possible.

Single-keyed Files

Single-keyed MKEYED files have only one key. It is important to understand, however, that this is different from a multi-keyed MKEYED file with only one key.

Single-keyed files do not derive the key from the record. This is different from the way that multi-keyed files are handled (see Multi-keyed Files below). As a result, the key value has no physical connection to the record itself. Most developers that use single-keyed files tend to use a key value derived by using one or more fields from the record. While this is arguably the best approach to take, remember that this value is not physically derived from the record by the file system, but rather, by the developer somewhere in the application. Calls to WRITE RECORD (i.e. KEY="MyKey") specify the key value. This is the main difference between single-keyed and multi-keyed files. The record automatically derives the key values in multi-keyed files from the file system based on the key descriptions specified at the time of creation (see Multi-keyed Files below for complete details). Furthermore, the key on a single-keyed file does not have segments defined (as with multi-keyed files) since it has no physical reference to the record.

Using single-keyed MKEYED files with the SQL engine will work well as long as the key consists of values from one or more columns defined in the table. If this is the case, create the index definitions in the data dictionary using DD Builder, with the key dependant on the appropriate column definitions. If the key is not related in some way to columns in the table, then there will no way for the SQL engine to do any optimization of queries and will therefore perform very poorly.

Multi-keyed Files

Multi-keyed MKEYED files are quite different from single-keyed MKEYED files. These files are the most powerful and flexible files of those available in both PRO/5 and BBj (new file formats in BBj such as XKEYED and JKEYED (Deprecated) that do not have the key limitations of the MKEYED file format). Multi-keyed MKEYED files are the ideal file type to use when the SQL engine uses the files because their design works naturally in an SQL environment.

Multi-keyed files can have up to 16 different keys. Each key can have multiple segments defined with a total of 48 segments spread between all keys on the file. To provide fast access to individual or groups of records in the file, Multi-keyed files use keys sorted in either ascending or descending order just like single-keyed files. However, unlike single-keyed MKEYED files, the filesystem derives the key values when writing the file to disk from the actual records. This means that the key values are determined automatically, and linked to the record itself.

Each defined key links to the record by describing a portion(s) of the record. A segment within a key refers to a portion of the record from which the segment's value will be determined. As mentioned above, each file can have a maximum of 48 segments defined among all keys on the file.

Multi-keyed file key definitions do not simply specify a length as with single-keyed files. Each key definition in a multi-keyed file specifies three pieces of information about each segment within the key: field number, offset within the field, and length of the segment. There are two methods for defining keys: non-fielding and fielding. Non-fielding is the easiest method to understand, and it is the least prone to errors.

Field Number – Non-fielding

The key definition designates a field number of "0" when using non-fielding. Non-fielding should only be used when all of the fields in the string template are fixed length (i.e. no C(10*), C(10**), N(10*), etc.). Non-fielding ignores the idea of field number and relies solely on the offset within the field and length values, described in the next two sections.

Offset Within the Field – Non-fielding

The offset value refers to the location within the record where the key value should begin. When using non-fielding, the file system does not care what the byte values are within the record. The offset value is fixed to a particular location within the record.

Therefore, if the record layout is:

COL1:C(10),COL2:I(4),COL3:C(2)

an offset of 1 refers to the beginning of COL1, offset 11 refers to the beginning of COL2, and offset 15 refers to the beginning of COL3.

Length of Segment – Non-fielding

The length portion of a segment definition refers to the number of bytes that the segment should cover. Using the example from the previous section, an offset of 0 and length of 10 refers to the entire COL1 field. An offset of 11 and length of 6 refers to the COL2 and COL3 fields.

Field Number - Fielding

If the field number is improperly used, a key may not work correctly when accessed via the language, but especially by the SQL engine.

Use fielding when using variable length fields in the string template to define the record layout. Fielding works by considering a new field "found" when reaching a new-line character in the record (that is character $0A$). It is important to understand that fielding only works reliably on sections of the record that are guaranteed to contain new-line characters in expected locations (i.e. the end of variable length fields). For example, a field of data type "B" could possibly have a new-line character within its value since it is binary data and can contain any ASCII value (this is also true of fixed length character fields, although less common). If one record had a value containing a new-line character in that field, while another record did not, the two records would actually have different sections described as falling within fields located after the field containing the new-line. This scenario could cause serious problems when trying to use key values for locating records, especially in the SQL engine.

When writing a record to a multi-keyed MKEYED file, the key values come from the record, based on the contents of that record. If using fielding in the key definition, it will examine each byte of the record, starting with the first byte, one by one. When it reaches the first new-line character, it will have reached the end of the first "field." In this sense, it may correspond to one or more fields within the string template used to describe the record layout. It continues this process until it reaches the end of the record. For example, if the record was described using the following string template:

COL1:C(20*),COL2:C(10),COL3:C(20*)

COL1 is considered inside "field 1" while COL2 and COL3 is considered inside "field 2" (assuming COL2 does not contain any new-line characters). This is because COL2 does not have a field terminator, and therefore would not trigger the change to the next field (with regard to keys). The new-line character in COL3 would actually trigger the end of field 2. See the Examples section below.

Offset Within the Field - Fielding

The offset refers to the number of bytes from the beginning of the field where the segment should begin. This is usually 0, unless there are one or more fixed length fields present before a variable length field (i.e. our example in the Field Number - Fielding section above).

Length of Segment – Fielding

The length of segment value refers to the entire length of the defined key segment. This is not necessarily the same as the maximum length specified in the string template's variable length column definition. In other words, define the column large enough to hold any value in that field. For example, a field defined as C(20*) should have a segment defined with a length of 20 or higher. If only 5 bytes were in the record before the new-line end of field terminator, the key value would be padded with null bytes to fill the segment.

Examples

The following examples show several string templates with the appropriate key definitions for the resulting records. Use the key definitions in conjunction with the MKEYED Verb.

All of the examples below assume that fixed length character fields DO NOT contain any new-line characters.

Given the following template:

COL1:C(20),COL2:C(10),COL3:I(4),COL4:C(2)

The following key definition would create a key on COL1, and a second key on COL2 and COL3:

[0:1:20],[0:21:10]+[0:31:4]

Given the following template:

COL1:C(20*),COL2:C(10*),COL3:C(2*)

The following key definition would create a key on COL1, and a second key on COL2 and COL3:

[1:1:20],[2:1:10]+[3:1:2]

Given the following template:

COL1:C(20),COL2:C(10*),COL3:C(2*)

The following key definition would create a key on COL1, and a second key on COL2 and COL3:

[1:1:20],[1:21:10]+[2:1:2]

Given the following template:

COL1:C(20*),COL2:I(4),COL3:C(10*)

A potential serious problem would arise. Field 1 would refer to COL1 and field 2 would refer to a combination between COL2 and COL3 sometimes, and sometimes it could refer to COL2 only, or even the first byte of COL2 (if it were a new-line character). Avoid this type of layout at all cost.