A Brief Tour of Paradox Database Files

by Mike Robinson

In the movie, The Wizard of Oz, we were exhorted to “pay no attention to the little man behind the curtain.”

But, c’mon ... let’ do.

After all, if ChimneySweep® is all about repairing Paradox database files, it might be very useful indeed for you to see for yourself exactly what all of the various files that make up “a Paradox table” actually are ... and, what they actually do.   Come closer.   Let me push the little curtain aside ...

One Table = Many Files

The first thing that we need to realize is that one Paradox table physically consists of many different files.   So, if we have the very ubiquitous Orders table in our database, there might be a half-dozen or more files present, all of them having the same basic name:   Orders-dot-something.   Almost all file-names that you will ever find on a computer’s hard drive follow this naming convention:   the name of the file always ends with “dot-something.”   This “something,” which runs from the rightmost “period” to the end of the name, is called a file extension.   It helps to describe what sort of information this particular file contains.   (For example, “Microsoft Word documents” end in .doc or .docx.)   Paradox follows the same convention.   A single table consists of a family of related files.   The base-name of the file is the name of the table, and the various extensions describe (to Paradox) what sort of data each file contains.

So... do we actually need to have a review of just what those file-extensions are, and what each file contains?   Actually, “no we don’t, so, in this article at least, we won’t.”   The only thing that you really need to know is that most of the table’s data will always be found in a file named:   tablename.DB.   Always.   No matter what other files may or may not exist, that one file will always be there.   (Or, you are in a heap of trouble ...)

Here’s what does matter:   since “one table” consists of “many files,” all of the information in every one of these files must at all times remain synchronized with what is in every other file.   Paradox uses a system of “change counters” to keep track of when each file has been updated, so that an “out-of-date” file can be recognized by the fact that its counter-values don’t match.   (ChimneySweep recognizes these counter values, but doesn’t rely upon them:   it examines the data itself.)

Footnote:   Is this kind of file-structure unique to Paradox?   By no means!   In fact, most database systems out there seem to use this strategy, including well-known client/server systems such as MySQL.   Yet, there are others that do not.   Microsoft Access, for example, stores everything in just one .MDB file.

Indexes

Indexes serve exactly the same purpose as the indexes (or table of contents) in a book:   they help you to find information quickly and directly.   Paradox tables have two kinds of indexes:   primary and secondary.   A primary index is built when you use the “*” character when constructing a table to indicate that certain fields make up the table’s primary key.   Additional (secondary) indexes are built only when you specifically request them.

A table’s primary index always occupies a single additional file.   Each secondary index occupies two more files apiece.   The internal structure of these files is more-or-less identical to that of the table’s “.DB” file ... a fact that’s not terribly important to us unless the table grows very big.   If a table is very large, its index files will probably run out of room before the table itself (sans indexes) would have.

As you change the values in fields which are indexed, the indexes must be updated also.   This causes varying amounts of file space to be consumed.   Although index files are designed to remain fairly efficient, their efficiency does tend to degrade (sometimes dramatically), such that they might need to be periodically “rebuilt.”

Index files are the most common type of file to become “out of date,” thus generating the infamous Index Out of Date message.

Memos and “BLOBs”

Most Paradox fields can contain no more than some fixed amount of information.   For example, an “A”lpha field may contain up to 255 characters, and, as it so happens, always reserves a fixed amount of space in the file whether you use all of it or not.   But there are a few special field-types which don’t abide by those rules:

Let’s call all of these field-types “BLOB fields,” and simply say that Paradox uses an additional, separate file to store the field values.

In what is otherwise a very rugged and “bulletproof” design, the BLOB-file architecture in Paradox is a surprisingly weak link.   (And, if one examines historical data within these files, as we have, the implementations over the years seem to have been a bit buggy.)   It is possible for Paradox to find itself unable to locate a field’s value within the BLOB file, which either causes the field to become blank (NULL), or causes Paradox to crash.   Whereas most table-repair solutions simply destroy BLOB-field values that they cannot find, ChimneySweep does try valiantly to find them and fix them ... but, because of the nature of the internal file structures that Paradox decided to use, it can’t always do so.

Table-Files, Blocks, and Space Recycling

As you know, every database table consists of a collection of rows (“records”), and you can insert or delete rows with impunity.   But Paradox still has to manage the space within the “.DB” file, and here’s a little bit of background information concerning just how it does it.

Space within the file is divided into blocks, and the size of a block is specified using the BDE Configuration control-panel.   Blocks are some multiple of 2 kilobytes each:   usually 2K, 4K, 8K, 16K or 32K.   Each block contains a few bytes of overhead, followed by an integral number of rows.   (Any left-over space is simply wasted.)

Rows are always stored in primary-key order, so if a new row needs to be inserted, it might need to go “in the middle of” a particular block ... causing that block to be split, more or less in half, to make room.   So, as time goes by, most of the blocks in the file won’t be full.

Finally... there are never more than 65,535 blocks, total, in any file.   This is a hard limit.

Paradox does recycle space within the table file, but does so only when a block becomes completely empty.   When a row is deleted, the corresponding block is rewritten with the row omitted.   (So, “yes, the row is really gone.”)   But there could be blocks out there which have only one row in them.

Validity Checks

Paradox databases have a well-defined system for defining “validity checks” and “table lookups.” For example, if you want to be sure that the part_number that you’re referring to in an Invoice_Line_Items table actually exists in the Parts table, Paradox can help you do that ... up to a point.   (The implementation has always left many things to be desired.)

This sort of information is stored in a separate file known as a “family file,” and, unfortunately, it is quite easily damaged.   ChimneySweep can restore this information only if “known good” information has been captured within the job itself. It can check table-lookups for inconsistencies, but will attempt to do so only if this “known good” (hence, trustworthy) information has been captured in this way.

File Headers

Every file in the Paradox system starts with some sort of header, always located at the very beginning of the file.   The information in the header not only identifies the file, but provides the information necessary to deal with its content.   So, the very first thing that Paradox (or ChimneySweep) does with any file is to read its header and validate it.

If Paradox thinks that it has detected anything wrong with the file header, it (probably...) won’t permit any sort of file access to take place.   ChimneySweep, obviously, takes a slightly different tack.

If ChimneySweep is confident that the file actually is one that it should know how to process (after all, not every file with an extension “.DB” actually is a Paradox table...), header-related problems will fall into one of three categories:

  1. Catastrophe:   The header (and probably the rest of the file) is irretrievably destroyed.
  2. Problems that can only be addressed if the job is known to contain “known good” structure information that appears to be current and reliable.
  3. Problems that can be reliably fixed by examining other information in the file.
ChimneySweep is designed to be very cautious and conservative.   If there is a strong possibility that the problem can safely be corrected by entirely automatic means, it may attempt to do so.   But ChimneySweep does generally follow its own version of the Hippocratic Oath:   “First, Do No (More) Harm.”   File-header problems are very serious, and suggestive of the existence of even worse problems.   ChimneySweep is not in the business of “being heroic.”

In Summary ...

Paradox table repair consists of thoroughly understanding these file structures and how they relate to one another.   ChimneySweep does have this depth of knowledge ... so that you don’t have to!   But it is very interesting, nonetheless.   We hope that this very brief “guided tour” has helped you.