MKEYED Files

MKEYED files are similar to regular keyed files with a few exceptions. MKEYED files may:

Grow dynamically by specifying a record count of 0.
Contain up to 16 keys per record. A key segment may be defined as being unique, and/or as being in descending order. Records can never be accessed by record number. The MKEYED verb creates an MKEYED file.

By default, the MKEYED verb only creates 2GB files. PRO/5 2.x has the capability to use 4GB MKEYED files on operating and filesystems that can support 4GB files. PRO/5 3.x, on supported platforms, supports true 64-bit files that can grow past the 4GB limit of the previous versions and is limited only by physical disk space. 64-bit MKEYED files can be created using SETOPTS 7 $80$. See the MKEYED Verb - Create MKEYED Files description for 64-bit file information. The mrebuild utility can be used to convert multi-keyed files from one format to another.

64-bit MKEYED files are not supported on Windows 95 or 98. Attempting to open or erase a 64-bit MKEYED file under Windows 95 or 98 will cause an!ERROR=153(File system does not support large files). However, access to these 64-bit files through the PRO/5 Data Server is fully supported.

NOTE: If using string templates, please refer to MKEYED Files - Using With String Templates before proceeding. This topic provides additional information related to using string templates, including design considerations and implications when accessing MKEYED files via SQL statements whether from the language (via SQL verbs), or ODBC/JDBC drivers.

Single Key Mode

Single-key mode is almost identical to the DIRECT and SORT files. Since the READ, WRITE, and REMOVE commands are the same, most existing applications may use MKEYED files instead with no modification. Definition of a single-key MKEYED file is identical to that of a DIRECT file and has the following form:

MKEYED fileid,keysize,records,recsiz{,ERR=lineref}

Any DIRECT statement in an application may be changed to an MKEYED statement. Any SORT statement may be changed to an MKEYED statement, but a record size must be given as 0. (A SORT file is really nothing more than a DIRECT file with a record size of 0.) The FID() of an MKEYED file will indicate a file type of $06$. Any FILE statement defining a DIRECT or SORT file should be changed for this.

Accessing an MKEYED file is done identically to a DIRECT/SORT file except that physical indices are not meaningful. MKEYED files are very strict about referencing records by key and not by number. The IND() function will return a value that is not meaningful in the traditional sense. The IND= option is illegal and will generate an error. Any application accessing a DIRECT/SORT file by index will have to be changed before using MKEYED files.

Multiple Key Mode

In multi-key mode, each record may have up to 16 keys associated with it. One key per record is designated the "primary key" and must be unique among all primary keys in the file. The remaining keys are "alternate keys" and may contain duplicates.

Defining MKEYED Files in Multi-key Mode

Since it can be tedious for an application to be constantly providing up to 16 different key strings when writing to a file, PRO/5 extracts keys automatically from the data record. Therefore, definition of a multi-keyed file will contain information about where each key is located within the data record. Any key can be a single field or part of a field from the data record. Also a key can be a composite of more than one field or parts of fields. A single field or part of a field is called a segment." A key can be composed from one or more segments. The total size of a key cannot be more than 120 bytes. You can define a maximum of 48 segments from which keys are composed. The maximum number of keys per record could be as high as 16, but could be lower depending on the number of composite keys used.

The best way to explain this is with an example. Suppose you wanted a file called "xyz" with 1000 records, 80 bytes each. You decide that the first 5 bytes of the first field of the record will be a primary key. An alternate key may be found in the third field beginning with the fifth character and will be 10 bytes long. The definition for this file would be:

>MKEYED "xyz",[1:1:5],[3:5:10],1000,80

Please note the use of square brackets. This is how PRO/5 can tell a multi-key MKEYED file from a single-key MKEYED file. Any integer expression may be used within the brackets. The first value indicates a field number within the record. For purposes of scanning and extracting keys from records PRO/5 considers only the linefeed $0A$ character to be a field delimiter. The second value specifies a starting position within the field (with 1 being the first character). The third value specifies the maximum length of this particular key segment. If the end of the field is encountered before this length is satisfied then PRO/5 will assume nulls ($00$) for the remainder of the key segment. If the end of the field is encountered before the beginning position of the key segment, an error is issued. As a matter of practicality, all keys must be contained in the first 1024 bytes of a record. A key segment may be defined as being in a descending order as in:

>MKEYED "xyz",[1:1:5],[3:5:10:"D"],1000,80

Also, note that PRO/5 considers the first field in the record to be field 1. Field 0 may also be specified and is considered the entire record without regard to field delimiters. This should be used when non-fielded I/O is the normal mode of access to the file (READ RECORD/WRITE RECORD using string templates with all fields defined as fixed length, as opposed to READ/WRITE). Extracting keys using non-fielded keys in this manner is also more efficient since there is no need to scan for field delimiters. If no field number is given (PRO/5 considers it optional) then it defaults to 0. However, it is a good idea to always provide the field number to avoid possible confusion.

At this point, the concept of key number should be discussed. Since there may be several keys per record, for some operations it is important to tell PRO/5 which key you mean. The order in which the keys are defined in the MKEYED statement is important. The first key defined is the primary key and is referred to as key #0. The remaining keys are alternate keys and are referred to as key #1, key #2, and so on.

Now, let's say that because the customer had a last minute change of mind, the alternate key must also contain the first 5 characters of the fourth field of the record. Our MKEYED statement will now look like this:

>MKEYED "xyz",[1:1:5],[3:5:10]+[4:1:5],1000,80

The alternate key is now a composite of 2 segments. Note that a plus sign was used to indicate this. Up to 16 keys composed of any number of segments may be specified as long as the total number of segments does not exceed 48.

A final note on key definitions: if this form of MKEYED statement is used but defines ONLY ONE KEY, it is still considered a multi-keyed file mainly because of the idea that brackets were used and the key is contained in the record.

Writing Records in Multi-key Mode

Since all keys are defined in the record, there is no need to use the KEY= option on the WRITE statement. If you are accustomed to single-key files this may take a little getting used to. Using KEY= on a WRITE is ignored in PRO/5.

Either a WRITE or WRITE RECORD may be used. For example, assuming our file has been open on channel 1, our WRITE statement may look like this:

>WRITE (1)A$,B$,C$,D$,E$

At this point, PRO/5 will determine the primary key and try to locate that key in the file. If it is not found, then PRO/5 assumes you are adding a NEW RECORD in which case the primary key and all alternate keys are added to the file. If any of the alternate keys flagged as unique keys exist, then PRO/5 will generate a DUPLICATE KEY error. If the primary key already exists in the file then PRO/5 assumes you are REPLACING AN EXISTING RECORD. In this case a DOM= option may take effect if it were specified. If not, then PRO/5 will read in the old record and compare it with the new record. Any alternate keys that have been modified will be re-keyed, the old key value removed, and the new key value added. If a duplicate key is found in an alternate key that has been flagged as being unique, a DUPLICATE KEY error will be generated.

Reading Records in Multi-key Mode

Any record may be accessed by either primary or alternate keys. When reading a multi-keyed file PRO/5 keeps track of a default key number." The default key number is set to key #0 when the file is OPENed. It may be changed only by a KNUM= option on a READ statement. For example:

>READ (1,KEY=K$,KNUM=1)A$,B$,C$,D$,E$

The above READ will set our default key number to 1 and try to locate KEY=K$ in the alternate key set. If no KNUM is given then the current default KNUM is assumed. If the key is found, then the corresponding record is read. When randomly accessing keys in an alternate key set and there are duplicate keys with K$, the first record encountered with KEY=K$ will be returned. Subsequent sequential READs will access the rest of those records. The order in which records with duplicate keys are encountered is not defined. If not found, then the !ERROR=11 case applies exactly the same as DIRECT/SORT. If KNUM= is given but no KEY=, then the file pointer is set to the first record in the file according to the sort sequence specified by KNUM=.

When doing a sequential READ such as:

>READ (1,END=7000)A$,B$,C$,D$,E$

the records will be accessed by the sort associated with the current KNUM value. For example, a program to produce a report from our file in alternate sort sequence could look like this:

1000 OPEN (1)"xyz"
1020 EXTRACT (1,KNUM=1)
1030 REM - MAIN LOOP
1040 READ (1,END=2000)A$,B$,C$,D$,E$
1050 … (print out)
1060 GOTO 1040
2000 REM - END OF FILE REACHED
2010 …

Please note that accessing records by key in an alternate key chain may result in looping if you use a construction similar to:

1000 REM - Set up for loop
1010 READ (1,KNUM=1)
1020 K$=KEYF(1,END=2000)
1030 REM - MAIN LOOP
1040 READ (1,KEY=K$,END=2000)A$,B$,C$,D$,E$; K$=KEY(1)
1050 … (print out)
1060 GOTO 1040
2000 REM - END OF FILE REACHED
2010 …

As this loop is retrieving a key from an alternate chain, it is possible to encounter duplicate keys. Once encountered, the READ will re-position the file pointer to the first key in the duplicate key chain, causing the KEY() function to keep retrieving the same key over and over again.

1000 REM - Set up for loop
1010 EXTRACT (1,KNUM=1)
1030 REM - MAIN LOOP
1040 READ (1,END=2000)A$,B$,C$,D$,E$
1050 … (print out)
1060 GOTO 1040
2000 REM - END OF FILE REACHED
2010 …

Removing Records in Multi-key Mode

Removing records from a multi-keyed file is done the same as with DIRECT/SORT files. The KEY= clause must be given and indicates the record to be removed by its primary key regardless of the current default KNUM.

>REMOVE (1,KEY=K$)

In any remove from an MKEYED file in multi-key mode, the key removed must be from the primary chain.

Accessing Keys in Multi-key Mode

The key functions (KEY(), KEYP(), KEYN(), KEYF(), KEYL()) behave as they do with DIRECT/SORT files except they apply to the current default KNUM. For example, if the alternate key set is the current default (because of a prior READ with KNUM=1) the KEYF(1) will return the first key in the alternate sort, KEYP(1) will return the prior key with respect to the alternate sort, and so on. KEY(1) will return the current key from the current record. Since there are times when it is desirable to examine other keys from the current record, KNUM= may be given when using KEY(). For example, KEY(1,KNUM=0) will return the primary key from the current record. The use of KNUM= in the KEY() function does NOT change your default KNUM. The default can only be changed by a READ or EXTRACT statement.

FID(), FIN() and FILE

As mentioned earlier, the file type returned in the FID()information for MKEYED files is $06$. The question now arises; how to tell a single-key MKEYED file from a multi-keyed MKEYED file? Furthermore, if multi-keyed, where is all that key definition information kept?

For single-keyed files the key size field in the FID() contains the usual key size information. For multi-keyed files, the key size field of the FID() will contain a $00$ and the key information will come back through the FIN() in positions (86). The following are the special fields returned by the FIN():

Position	Description
65,20	Same as in a DIRECT file
85,1	Current KNUM value
86	Key definition information

The key definition information is 48 8-byte entries using the following fields for each of the 48 8-byte entries:

Bytes	Contents
1,1	Key number or $FF$ for the end of definitions
2,1	Field in record to use or $00$ for no fielding
3,2	Offset in field/record (0 based, 1024 maximum)
5,1	Length of this segment
6,1	Segment modification bits:
	$01$ - descending segment
	$02$ - unique flag
	$04$ - Business Math flag
7,2	$0000$

Following the last segment definition the next key segment description key number must contain a $FF$ to indicate the end of the next segment list.

The FILE verb can be used to create a multi-keyed MKEYED file by specifying the optional second string, which contains the key information from the FIN(). For example, to erase and re-define a multi-keyed MKEYED file:

>OPEN (1)"xyz"
>LET F$=FID(1),Z$=FIN(1)
>CLOSE (1)
>ERASE "xyz"
>FILE F$,Z$(86)

Using MKEYED Files

The following points may help you decide whether to use DIRECT/SORT or MKEYED for a file:

An MKEYED file will dynamically allocate disk space as it is needed when defined with 0 records. This makes it possible to define a file with an extremely large record capacity without instantly using up a lot of disk space. However, a full MKEYED file may use MORE disk space than a DIRECT/SORT file of the same capacity.
New keys may be added to a large MKEYED file usually FASTER than to a large DIRECT/SORT file.
Sequentially scanning an MKEYED file will be somewhat SLOWER than a DIRECT/SORT file.
Randomly locating a key in a large MKEYED file may require more disk accesses than with a DIRECT/SORT file. However, on the average there probably will not be any real difference.
A damaged MKEYED file is harder to reconstruct than a damaged DIRECT/SORT file. This requires more control over backups, etc.

A good rule of thumb would be: The more volatile the file, the more likely an MKEYED file will be preferred for performance. The more static the file, the more likely a DIRECT/SORT file will be preferred.

Finally, when building an MKEYED file (or DIRECT/SORT file) always LOCK the file if it is practical to do so. Since PRO/5 will know it has exclusive access to the file, it will bypass the usual locking checks on every access resulting in a SUBSTANTIAL improvement in performance.

An MKEYED file may be created with no record data (similar to a SORT file), with record data and a user specified key (similar to a DIRECT file), or with a record and multiple keys (MKEYED files).

Corruption Recovery Format MKEYED Files

MKEYED files can be created in or converted to a format that allows them to be easily recovered if they become corrupted. The mkconvert utility is a stand-alone executable that converts existing files, while new files created with SETOPTS 7 $20$ automatically have this format.

For each stored record in a corruption recovery format file, the new format adds a four-byte tag, key data for a single-keyed MKEYED file, and a four-byte checksum of the key and record data used to detect the actual data corruption of the record. This information enables the efficient recovery of record and key data, even if the data search tree is corrupt or missing.

The tag, key data, and checksum information cause the new format file to be larger than the original file. The size increase, however, is proportional to the number of records in the file.