Tuesday, June 14, 2011

How to parse D64 files, part 2: file chains

Last time, we saw how to crack open a D64 file with Perl and get at the first directory block. You may recall, two of the bytes in a directory entry point to the first block of the file proper, by way of a Track number and Sector number.

Now let's see how to read files in general.

A file in a Commodore disk image is stored in blocks. Those blocks may be scattered across the disk; in fact, due to mechanical considerations, files were almost never written in a contiguous set of blocks. Instead, a block was written, then the next block was written a few sectors away, then nthe next a few more sectors away, and so on.

When a block is read, the next block's location is found in the first two bytes. The remaining 254 bytes are file data proper.

So then, suppose a directory entry indicates that a file begins at track $t, sector $s. Using Perl, the file could be reconstructed in a manner similar to this:

my $fileData = readFile( $startTrack, $startSector );

readFile( $buffer, $t, $s )
{
return $buffer unless $t;
my $byteOffset = 256 * ($sectorOffset[ $t ] + $s);
($t, $s) = unpack "CC", substring( $diskImage, $byteOffset, 2 );
$buffer .= unpack "C*", substring( $diskImage, $byteOffset + 2, 254 );
return readFile( $buffer, $t, $s );
}


@sectorOffset will require some more explanation.

Monday, June 13, 2011

How to parse D64 files, part 1

This is a short how-to for parsing commodore 1541 diskette images, colloquially called D64 files after their 3-character suffix.

These instructions assume you know how to program in at least one C-derived programming language. I will be using Perl, but it will be very C-like.

STEP ONE: install Perl. A good and free distribution can be had over at http://activestate.org.

STEP TWO: fetch some D64 files. One likely place is http://lemon64.com.

Now it's time to write some code. D64 files are simply data laid out the way a 1541 would see it as it reads it straight from the disk, from the beginning to the end. So the code will have to navigate the structure of the 1541 diskette format. First though, let's slurp the entire image into a buffer..


my $filename = 'whatever.d64';
my $filesize = -s $filename;

open IN, $filename;
binmode IN;
my $buffer;
read( IN, $buffer, $filesize );
close IN;


Okay, we've got the disk in a buffer. Now what? Now we wrest the structure from this pile of bytes. That structure begins with the DIRECTORY. The directory offset is 18 * 21 + 1 blocks in, and a block is 256 bytes. So let's put that in a variable for later.


my $directorySector = (18-1) * 21 + 1;
my $offset = 256 * $directorySector;


Now we need to parse the actual data from the directory. The directory block consists of eight entries of 32 bytes each. Each byte has a meaning, some point to locations on the disk, some are part of a filename, some are status bytes for the file, etc. I will iterate over the directory block, 32 bytes at a time, and at each iteration unpack some of the current 32 byte structure into its component values:


for ( my $j=0; $j<256; $j+=32 )
{
my ($dirtrack, # unsigned Char
$dirsector, # unsigned Char
$type, # unsigned Char
$track, # unsigned Char
$sector, # unsigned Char
$filename) # 16-character ascii string
= unpack 'CCCCCa16', substring( $buffer, $offset + $j );

print "$filename [$type] is located at $track/$sector\n";
}


One very important pair of data in the above structure are $track and $sector. They tell us where to find the first block of that file.

This is a good stopping point. We've taken a D64 file, read it in, and printed out the contents of the first directory sector -- I.e. The first 256-byte block of the directory. Next time, we'll see how to read in an entire file.