Friday, June 18, 2004
A Comparison of PDS and PDSE
PDSEs (Partitioned dataset extended) were first introduced by IBM in MVS/AFP 3.2 in the year 1989. Even though regular PDSes were quite adequate for normal tasks, many IBM customers were not happy with them. One of the IBM user groups, SHARE, did a project on MVS Storage Management and published a white paper. This paper summarized the findings of the project and asked for a number of improvements and new features to then current PDSes. Another IBM user group, GUIDE, also published their requirements and asked for similar changes. IBM listened and the result was PDSE. But first a bit of background on plain old PDS to understand its limitations.
A PDS is essentially made up of two parts - a directory and a few members. The directory is a set of contiguous 256-byte blocks, present at the beginning of the dataset. Each of these directory blocks contains a 2-byte count field at the beginning and from 3 to 21 directory entries after that. There is one directory entry for each member in the PDS. Each directory entry contains - 8 byte member name (padded with spaces, if needed), starting position of the member in the PDS (in TTR addressing format) and some optional (up to 62 bytes) user data.
A directory block will contain only as many complete entries as can fit in 254 bytes (2 bytes are reserved for count field). The remaining bytes are left unused. The length of the user data determines how many complete entries can fit in one directory block. The 2-byte count field contains the number of used (also called 'active') bytes, including the bytes used for the count field.
This directory structure was the reason why there was a need to improve PDS. When IBM introduced PDSE, it replaced the rigid directory structure of the PDS with a new flexible scheme and also brought in many new features. And all this was done while keeping the PDSE backward compatible with PDS. That means that except for very low-level (hardware dependent) processing, users need not even be aware of what they are dealing with.
Some of the new features introduced in PDSE and their comparison with PDS is given below -
- Expandable directory size
The number of directory blocks in a PDS is specified at the time of its creation and can not be changed after that. Also the space for all the directory blocks is allocated at the time of creating the dataset. Lets say that a PDS was allocated with a directory block count of 20. Assume that an average 256-byte directory block holds 10 directory entries. So now this PDS can contain at most 20x10 = 200 members. But what if you use this up and want to create 201st member? Tough luck!
PDSE solved this problem by creating an indexed directory structure. Now each directory entry points to the one coming next to it. This matters because now there is no need to allocate all the directory blocks at the time of creating the dataset. This also means that they need not be contiguous and need not be fixed in number. They can be interleaved with the member data blocks and they indeed are! When you want to create new members, a new directory block is created in the next available storage and the pointers updated.
Note that its only the directory blocks that increase in number. The total size of the PDSE does not grow beyond one primary extent and 123 secondary extents. In other words, the directory can expand only if there is enough space in the dataset. The maximum size of the PDSE itself remains fixed.
- Better search and insertion
The directory entries in a PDS are stored in the alphabetical order of member names. So if a new entry is to be created, all the entries coming after it need to be shifted to make room for it. This is called 'Ripple Stow' and it results in many I/O operations, making the whole process a lot slower. Same holds true for searching for a member within a PDS. The entire directory needs to be scanned to locate a particular member.
Since the directory in a PDSE is an indexed structure, there are no such performance problems in PDSE. So it always takes the same amount of time to search/insert a new member whether it starts with 'A' or 'Z'.
- Improved sharing facilities
The locking mechanism in a PDS operates at the dataset level. If you want to update a single member in a PDS, you need the exclusive access to the entire dataset. No other user or job can update any other member in that PDS during that time. While in a PDSE, the access control is implemented at the member level. So two users can update two different members at the same time. Makes you wonder how people worked before PDSE came.
- Better use of disk space
When a PDS member gets deleted, the space that gets freed up is not used for allocating new members. Since the deletion of a member causes the deletion of that directory entry, the pointer to that member location is lost and so is that space. As the members get allocated/deleted during the lifetime of a PDS, the amount of this wasted space keeps growing. This wasted space, also called PDS gas, can be as much as 40% of the total allocated space. So the PDS needs to be compressed periodically to re-claim this space. The compression can be done by either typing 'Z' in front of the PDS name (in ISPF) or by using IEBCOPY utility.
On the other hand, a PDSE keeps on re-claiming the freed space automatically, using a first-fit algorithm. Issuing a 'Z' command or doing an IEBCOPY has no effect on a PDSE.
Also, whenever a new member is created in a PDS, the data blocks allocated for it have to be contiguous. But there is no such restriction in a PDSE. So the space re-claimed from deleted members can be allocated to new or existing members. This results in a much better space utilization.
- Improved dataset integrity
If a PDS is opened for output in a sequential mode, e.g. if an IEBGENER step omits the member name and uses only the PDS name, say in
//SYSUT1 DD DSN=Some.input.sequential.file,DISP=SHRthe entire directory would get destroyed all the members would be lost. If a similar thing is attempted on a PDSE, The job would terminate with a abend code of S213-4C and the PDSE would remain intact.
//SYSUT2 DD DSN=PDS
S213-4C : WHEN OPENING A PDSE DSORG=PS WAS SPECIFIED, BUT NO MEMBER WAS SPECIFIED.
- Hardware independence
PDS uses an addressing scheme called TTR (Track-Track-Record) which is based on the DASD geometry. TTR addresses are stored in hexadecimal format. So an address of X’002E26’ would mean track number X'002E' and record X'26'. The name TTR comes from the fact that first two bytes of the address denote track number and third byte denotes record number. This dependence on the DASD geometry makes it very difficult to migrate PDS from one type of DASD to another one, e.g. from 3380 to 3390.
The PDSE addressing scheme is not dependent on the physical device geometry. It uses a "simulated" 3-byte TTR address to locate the members and the records which makes the migration easier. Incidentally, this simulation of addresses places some limitations on the number of members and the number of records per member in a PDSE. A TTR address of X'000001' in a PDSE points to the directory. The addresses from X'000002' to X'07FFFF' point to the first record of each member, which is why there is a limit of 524,286 members. The addresses from X'100001' to X'FFFFFF' point to records within each member, which is why there is a limit of 15,728,639 records in each member.