MKEYED Files
MKEYED files are similar to regular keyed files with a few exceptions. MKEYED files may:
-
Grow dynamically by specifying a record count of 0.
-
Contain up to 16 keys per record. A key segment may be defined as being unique, and/or as being in descending order. Records can never be accessed by record number. The MKEYED verb creates an MKEYED file.
By default, the MKEYED verb only creates 2GB files. PRO/5 2.x has the
capability to use 4GB
MKEYED files on operating and filesystems that can support 4GB files.
PRO/5 3.x, on supported platforms, supports true 64-bit files that can
grow past the 4GB limit of the previous versions and is limited only by
physical disk space. 64-bit MKEYED files can be created using SETOPTS
7 $80$. See the MKEYED Verb - Create
MKEYED Files description for 64-bit file information. The mrebuild
utility can be used to convert multi-keyed files from one format to another.
64-bit MKEYED files are not supported
on Windows 95 or 98. Attempting to open or erase a 64-bit MKEYED file
under Windows 95 or 98 will cause an!ERROR=153(File
system does not support large files). However, access to these 64-bit
files through the PRO/5 Data Server is fully supported.
NOTE: If using string templates, please refer to MKEYED Files - Using With String Templates before proceeding. This topic provides additional information related to using string templates, including design considerations and implications when accessing MKEYED files via SQL statements whether from the language (via SQL verbs), or ODBC/JDBC drivers.
Single Key Mode
Single-key mode is almost identical to the DIRECT and SORT files. Since the READ, WRITE, and REMOVE commands are the same, most existing applications may use MKEYED files instead with no modification. Definition of a single-key MKEYED file is identical to that of a DIRECT file and has the following form:
MKEYED fileid,keysize,records,recsiz{,ERR=lineref}
Any DIRECT statement in an application may be changed to an MKEYED statement.
Any SORT statement may be changed to an MKEYED statement, but a record
size must be given as 0. (A SORT file is really nothing more than a DIRECT
file with a record size of 0.) The FID() of an MKEYED file will indicate
a file type of $06$. Any FILE statement defining a DIRECT or SORT file
should be changed for this.
Accessing an MKEYED file is done identically to a DIRECT/SORT file except
that physical indices are not meaningful. MKEYED files are very strict
about referencing records by key and not by number. The IND() function
will return a value that is not meaningful in the traditional sense. The
IND= option is illegal and will generate an error. Any application accessing
a DIRECT/SORT file by index will have to be changed before using MKEYED
files.
Multiple Key Mode
In multi-key mode, each record may have up to 16 keys associated with it. One key per record is designated the "primary key" and must be unique among all primary keys in the file. The remaining keys are "alternate keys" and may contain duplicates.
Defining MKEYED Files in Multi-key Mode
Since it can be tedious for an application to be constantly providing
up to 16 different key strings when writing to a file, PRO/5 extracts
keys automatically from the data record. Therefore, definition of a multi-keyed
file will contain information about where each key is located within the
data record. Any key can be a single field or part of a field from the
data record. Also a key can be a composite of more than one field or parts
of fields. A single field or part of a field is called a segment."
A key can be composed from one or more segments. The total size of a key
cannot be more than 120 bytes. You can define a maximum of 48 segments
from which keys are composed. The maximum number of keys per record could
be as high as 16, but could be lower depending on the number of composite
keys used.
The best way to explain this is with an example. Suppose you wanted a file
called "xyz" with 1000 records, 80 bytes each. You decide that
the first 5 bytes of the first field of the record will be a primary key.
An alternate key may be found in the third field beginning with the fifth
character and will be 10 bytes long. The definition for this file would
be:
>MKEYED "xyz",[1:1:5],[3:5:10],1000,80
Please note the use of square brackets. This is how PRO/5 can tell a multi-key MKEYED file from a single-key MKEYED file. Any integer expression may be used within the brackets. The first value indicates a field number within the record. For purposes of scanning and extracting keys from records PRO/5 considers only the linefeed $0A$ character to be a field delimiter. The second value specifies a starting position within the field (with 1 being the first character). The third value specifies the maximum length of this particular key segment. If the end of the field is encountered before this length is satisfied then PRO/5 will assume nulls ($00$) for the remainder of the key segment. If the end of the field is encountered before the beginning position of the key segment, an error is issued. As a matter of practicality, all keys must be contained in the first 1024 bytes of a record. A key segment may be defined as being in a descending order as in:
>MKEYED "xyz",[1:1:5],[3:5:10:"D"],1000,80
Also, note that PRO/5 considers the first field in the record to be
field 1. Field 0 may also be specified and is considered the entire record
without regard to field delimiters. This should be used when non-fielded
I/O is the normal mode of access to the file (READ RECORD/WRITE RECORD
using string templates with all fields defined as fixed length, as opposed
to READ/WRITE). Extracting keys using non-fielded keys in this manner
is also more efficient since there is no need to scan for field delimiters.
If no field number is given (PRO/5 considers it optional) then it defaults
to 0. However, it is a good idea to always provide the field number to
avoid possible confusion.
At this point, the concept of key number should be discussed. Since there
may be several keys per record, for some operations it is important to
tell PRO/5 which key you mean. The order in which the keys are defined
in the MKEYED statement is important. The first key defined is the primary
key and is referred to as key #0. The remaining keys are alternate keys
and are referred to as key #1, key #2, and so on.
Now, let's say that because the customer had a last minute change of mind,
the alternate key must also contain the first 5 characters of the fourth
field of the record. Our MKEYED statement will now look like this:
>MKEYED "xyz",[1:1:5],[3:5:10]+[4:1:5],1000,80
The alternate key is now a composite of 2 segments. Note that a plus
sign was used to indicate this. Up to 16 keys composed of any number of
segments may be specified as long as the total number of segments does
not exceed 48.
A final note on key definitions: if this form of MKEYED statement is used
but defines ONLY ONE KEY, it is still considered a multi-keyed file mainly
because of the idea that brackets were used and the key is contained in
the record.
Writing Records in Multi-key Mode
Since all keys are defined in the record, there is no need to use the
KEY= option on the WRITE statement. If you are accustomed to single-key
files this may take a little getting used to. Using KEY= on a WRITE is
ignored in PRO/5.
Either a WRITE or WRITE RECORD may be used. For example, assuming our file
has been open on channel 1, our WRITE statement may look like this:
>WRITE (1)A$,B$,C$,D$,E$
At this point, PRO/5 will determine the primary key and try to locate that key in the file. If it is not found, then PRO/5 assumes you are adding a NEW RECORD in which case the primary key and all alternate keys are added to the file. If any of the alternate keys flagged as unique keys exist, then PRO/5 will generate a DUPLICATE KEY error. If the primary key already exists in the file then PRO/5 assumes you are REPLACING AN EXISTING RECORD. In this case a DOM= option may take effect if it were specified. If not, then PRO/5 will read in the old record and compare it with the new record. Any alternate keys that have been modified will be re-keyed, the old key value removed, and the new key value added. If a duplicate key is found in an alternate key that has been flagged as being unique, a DUPLICATE KEY error will be generated.
Reading Records in Multi-key Mode
Any record may be accessed by either primary or alternate keys. When reading a multi-keyed file PRO/5 keeps track of a default key number." The default key number is set to key #0 when the file is OPENed. It may be changed only by a KNUM= option on a READ statement. For example:
>READ (1,KEY=K$,KNUM=1)A$,B$,C$,D$,E$
The above READ will set our default key number to 1 and try to locate
KEY=K$ in the alternate key set. If no KNUM is given then the current
default KNUM is assumed. If the key is found, then the corresponding record
is read. When randomly accessing keys in an alternate key set and there
are duplicate keys with K$, the first record encountered with KEY=K$ will
be returned. Subsequent sequential READs will access the rest of those
records. The order in which records with duplicate keys are encountered
is not defined. If not found, then the !ERROR=11
case applies exactly the same as DIRECT/SORT. If KNUM= is given but no
KEY=, then the file pointer is set to the first record in the file according
to the sort sequence specified by KNUM=.
When doing a sequential READ such as:
>READ (1,END=7000)A$,B$,C$,D$,E$
the records will be accessed by the sort associated with the current KNUM value. For example, a program to produce a report from our file in alternate sort sequence could look like this:
1000 OPEN (1)"xyz"
1020 EXTRACT (1,KNUM=1)
1030 REM - MAIN LOOP
1040 READ (1,END=2000)A$,B$,C$,D$,E$
1050 … (print out)
1060 GOTO 1040
2000 REM - END OF FILE REACHED
2010 …
Please note that accessing records by key in an alternate key chain may result in looping if you use a construction similar to:
1000 REM - Set up for loop
1010 READ (1,KNUM=1)
1020 K$=KEYF(1,END=2000)
1030 REM - MAIN LOOP
1040 READ (1,KEY=K$,END=2000)A$,B$,C$,D$,E$; K$=KEY(1)
1050 … (print out)
1060 GOTO 1040
2000 REM - END OF FILE REACHED
2010 …
As this loop is retrieving a key from an alternate chain, it is possible to encounter duplicate keys. Once encountered, the READ will re-position the file pointer to the first key in the duplicate key chain, causing the KEY() function to keep retrieving the same key over and over again.
1000 REM - Set up for loop
1010 EXTRACT (1,KNUM=1)
1030 REM - MAIN LOOP
1040 READ (1,END=2000)A$,B$,C$,D$,E$
1050 … (print out)
1060 GOTO 1040
2000 REM - END OF FILE REACHED
2010 …
Removing Records in Multi-key Mode
Removing records from a multi-keyed file is done the same as with DIRECT/SORT files. The KEY= clause must be given and indicates the record to be removed by its primary key regardless of the current default KNUM.
>REMOVE (1,KEY=K$)
In any remove from an MKEYED file in multi-key mode, the key removed must be from the primary chain.
Accessing Keys in Multi-key Mode
The key functions (KEY(), KEYP(), KEYN(), KEYF(), KEYL()) behave as they do with DIRECT/SORT files except they apply to the current default KNUM. For example, if the alternate key set is the current default (because of a prior READ with KNUM=1) the KEYF(1) will return the first key in the alternate sort, KEYP(1) will return the prior key with respect to the alternate sort, and so on. KEY(1) will return the current key from the current record. Since there are times when it is desirable to examine other keys from the current record, KNUM= may be given when using KEY(). For example, KEY(1,KNUM=0) will return the primary key from the current record. The use of KNUM= in the KEY() function does NOT change your default KNUM. The default can only be changed by a READ or EXTRACT statement.
FID(), FIN() and FILE
As mentioned earlier, the file type returned in the FID()information
for MKEYED files is $06$. The question now arises; how to tell a single-key
MKEYED file from a multi-keyed MKEYED file? Furthermore, if multi-keyed,
where is all that key definition information kept?
For single-keyed files the key size field in the FID() contains the usual
key size information. For multi-keyed files, the key size field of the
FID() will contain a $00$ and the key information will come back through
the FIN() in positions (86). The following are the special fields returned
by the FIN():
Position |
Description |
65,20 |
Same as in a DIRECT file |
85,1 |
Current KNUM value |
86 |
Key definition information |
The key definition information is 48 8-byte entries using the following fields for each of the 48 8-byte entries:
Bytes |
Contents |
1,1 |
Key number or $FF$ for the end of definitions |
2,1 |
Field in record to use or $00$ for no fielding |
3,2 |
Offset in field/record (0 based, 1024 maximum) |
5,1 |
Length of this segment |
6,1 |
Segment modification bits: |
|
$01$ - descending segment |
|
$02$ - unique flag |
|
$04$ - Business Math flag |
7,2 |
$0000$ |
Following the last segment definition the next key segment description
key number must contain a $FF$ to indicate the end of the next segment
list.
The FILE verb can be used to create a multi-keyed MKEYED file by specifying
the optional second string, which contains the key information from the
FIN(). For example, to erase and re-define a multi-keyed MKEYED file:
>OPEN (1)"xyz"
>LET F$=FID(1),Z$=FIN(1)
>CLOSE (1)
>ERASE "xyz"
>FILE F$,Z$(86)
Using MKEYED Files
The following points may help you decide whether to use DIRECT/SORT or MKEYED for a file:
-
An MKEYED file will dynamically allocate disk space as it is needed when defined with 0 records. This makes it possible to define a file with an extremely large record capacity without instantly using up a lot of disk space. However, a full MKEYED file may use MORE disk space than a DIRECT/SORT file of the same capacity.
-
New keys may be added to a large MKEYED file usually FASTER than to a large DIRECT/SORT file.
-
Sequentially scanning an MKEYED file will be somewhat SLOWER than a DIRECT/SORT file.
-
Randomly locating a key in a large MKEYED file may require more disk accesses than with a DIRECT/SORT file. However, on the average there probably will not be any real difference.
-
A damaged MKEYED file is harder to reconstruct than a damaged DIRECT/SORT file. This requires more control over backups, etc.
A good rule of thumb would be: The more volatile the file, the more
likely an MKEYED file will be preferred for performance. The more static
the file, the more likely a DIRECT/SORT file will be preferred.
Finally, when building an MKEYED file (or DIRECT/SORT file) always LOCK
the file if it is practical to do so. Since PRO/5 will know it has exclusive
access to the file, it will bypass the usual locking checks on every access
resulting in a SUBSTANTIAL improvement in performance.
An MKEYED file may be created with no record data (similar to a SORT file),
with record data and a user specified key (similar to a DIRECT file),
or with a record and multiple keys (MKEYED files).
Corruption Recovery Format MKEYED Files
MKEYED files can be created in or converted to a format that allows
them to be easily recovered if they become corrupted. The mkconvert
utility is a stand-alone executable that converts existing files, while
new files created with SETOPTS
7 $20$ automatically have this format.
For each stored record in a corruption recovery format file, the new format
adds a four-byte tag, key data for a single-keyed MKEYED file, and a four-byte
checksum of the key and record data used to detect the actual data corruption
of the record. This information enables the efficient recovery of record
and key data, even if the data search tree is corrupt or missing.
The tag, key data, and checksum information cause the new format file to
be larger than the original file. The size increase, however, is proportional
to the number of records in the file.