Here's my working syntax for CargoCult.
assumptions
numerical expressions are per C standards.
variable declarations:
my [<type>] <id> [= <initialization expression>];
Array variables start with the sigil '@'. They're indexed with square brackets, as in C.
function declarations:
fn <returntype> <name> parm1, parm2, ...
<body>
endfn
function calls:
[<return value> =] <function name([<parameters>])>;
function parameters are comma-separated, and typed or untyped. If typed, the type precedes the identifier, e.g. callMyFunction( String foo, int bar );
for loops (currently only increment, by 1):
for |<indexname>| <start>..<end>
<body>
endfor
if statements:
if ( <expression> )
<body>
endif
return statements:
return <expression>;
Showing posts with label Programming. Show all posts
Showing posts with label Programming. Show all posts
Monday, August 13, 2012
Wednesday, August 8, 2012
CargoCult as an Intermediate Language
I face the onerous task of converting my commodore image reading code from AS3 into Perl and Objective-C.
Rather than port code twice to two platforms, I'd rather use CargoCult as the specification, and use real languages as targets.
I don't have to get 100% code conversion: I just need to get 80% of the way there to make this worthwhile.
That means CargoCult is a high-level Intermediate Language of sorts. It's C-like, but uses syntactic sugar in a way that makes it relatively easy to write generators to transform it to other languages. My goal is to be able to make line-by-line translators without having to do any real analysis of the code.
Here's a sample of CargoCult 1.0.
fn int buildZones totalSectors, startTrack, @zones
for |index| 0..@zones.length
my track = 1 + GLOBAL.totalTracks + startTrack;
my sectorCount = @zones[index][1];
my endTrack = 1 + GLOBAL.totalTracks + @zones[index][0];
GLOBAL.totalTracks += @zones[index][0];
for |jdex| track..endTrack
GLOBAL.@trackOffset[ jdex ] = totalSectors * 0x100;
GLOBAL.@sectorOffset[ jdex ] = totalSectors;
GLOBAL.@sectorsInTrack[ jdex ] = sectorCount;
totalSectors += sectorCount;
endfor
endfor
return totalSectors;
endfn
I've successfully translated this into fully functional Perl, ActionScript3, and Objective-C. It took 80 lines of code for each, but after that the translator was able to translate another CargoCult function, as well.
What I want to do next is build up a set of translations for each target language, for each transformation needed (line preprocessing, library call handing, subroutine handling, loop handling, and line postprocessing).
Rather than port code twice to two platforms, I'd rather use CargoCult as the specification, and use real languages as targets.
I don't have to get 100% code conversion: I just need to get 80% of the way there to make this worthwhile.
That means CargoCult is a high-level Intermediate Language of sorts. It's C-like, but uses syntactic sugar in a way that makes it relatively easy to write generators to transform it to other languages. My goal is to be able to make line-by-line translators without having to do any real analysis of the code.
Here's a sample of CargoCult 1.0.
fn int buildZones totalSectors, startTrack, @zones
for |index| 0..@zones.length
my track = 1 + GLOBAL.totalTracks + startTrack;
my sectorCount = @zones[index][1];
my endTrack = 1 + GLOBAL.totalTracks + @zones[index][0];
GLOBAL.totalTracks += @zones[index][0];
for |jdex| track..endTrack
GLOBAL.@trackOffset[ jdex ] = totalSectors * 0x100;
GLOBAL.@sectorOffset[ jdex ] = totalSectors;
GLOBAL.@sectorsInTrack[ jdex ] = sectorCount;
totalSectors += sectorCount;
endfor
endfor
return totalSectors;
endfn
I've successfully translated this into fully functional Perl, ActionScript3, and Objective-C. It took 80 lines of code for each, but after that the translator was able to translate another CargoCult function, as well.
What I want to do next is build up a set of translations for each target language, for each transformation needed (line preprocessing, library call handing, subroutine handling, loop handling, and line postprocessing).
Wednesday, April 4, 2012
The D40 Commodore Image format
The D40 is an exercise in using my flexible image configuration. Its goal is to design the largest image possible, which still uses the header block (and only the header block) to store BAM entries.
In other words, this format uses the header block as its primary design limitation.
Header
The most efficient existing Commodore header is used by the D81. It makes a great place to start out.
The D81's header starts out the same as all others: a two-byte pointer to the first directory sector, then the DOS type, then one byte with $00, for a total of 4 bytes.
The label offset is at 0x04. All Commodore labels consist of a 16-byte label, two $A0 bytes, two bytes for the Disk ID, one more $A0, two bytes for the DOS Type, and a final $A0 byte, for a total of 24 bytes.
That leaves (256 - 28) = 228 bytes for the BAM. Now the fun begins.
BAM
The Block Allocation Map (BAM) consists of an array of records, one record for each track on the disk. The first byte of each record is the "sectors free" count for that track. The remaining data is a bitmap of the sector allocation for that track: a "1" means that sector is used, while a "0" means the sector is unallocated and free for use.
A little calculation will find several schemes which fits into 228 bytes. Larger bitmaps tend to be more efficient with the space available. The layout I select is 25 tracks of 64 sectors each. The number of bytes needed for the BAM is (25 x (1+64/8)) = 25 x 9 = 225 bytes.
The total capacity of this image would be (25 x 64) blocks = 1600 blocks, or 400k.
Layout
I like to see the header at track 1 -- it's easier for a programmer to get at than at midpoint. The directory can have the remaining 63 sectors in track 1, for a maximum of 63 * 8 = 504 files, which should be plenty.
The remainder of the disk is usable for file storage, for a total storage space of 400 - 8 = 392k.
In other words, this format uses the header block as its primary design limitation.
Header
The most efficient existing Commodore header is used by the D81. It makes a great place to start out.
The D81's header starts out the same as all others: a two-byte pointer to the first directory sector, then the DOS type, then one byte with $00, for a total of 4 bytes.
The label offset is at 0x04. All Commodore labels consist of a 16-byte label, two $A0 bytes, two bytes for the Disk ID, one more $A0, two bytes for the DOS Type, and a final $A0 byte, for a total of 24 bytes.
That leaves (256 - 28) = 228 bytes for the BAM. Now the fun begins.
BAM
The Block Allocation Map (BAM) consists of an array of records, one record for each track on the disk. The first byte of each record is the "sectors free" count for that track. The remaining data is a bitmap of the sector allocation for that track: a "1" means that sector is used, while a "0" means the sector is unallocated and free for use.
A little calculation will find several schemes which fits into 228 bytes. Larger bitmaps tend to be more efficient with the space available. The layout I select is 25 tracks of 64 sectors each. The number of bytes needed for the BAM is (25 x (1+64/8)) = 25 x 9 = 225 bytes.
The total capacity of this image would be (25 x 64) blocks = 1600 blocks, or 400k.
Layout
I like to see the header at track 1 -- it's easier for a programmer to get at than at midpoint. The directory can have the remaining 63 sectors in track 1, for a maximum of 63 * 8 = 504 files, which should be plenty.
The remainder of the disk is usable for file storage, for a total storage space of 400 - 8 = 392k.
Tuesday, April 3, 2012
Commodore Disk Image headers, again
One minor nitpick about Commodore disk images is that they have no signature line. The only way you can tell what they are is to look at the extension, the file size, and perhaps try to jump to the header sector and "see" if it looks right. While this is not a major problem, I think there is a simple solution; namely, to add a signature to each disk image.
A signature is a small, initial data set which you can use to determine the nature of the disk unconditionally. My suggestion is to look for an optional 32 byte signature on all Commodore images; if it proves useful, then over time all such images will have this signature.
Examples.
D64 images will start with "1541 DISK IMAGE ".
D71 images will start with "1571 DISK IMAGE ".
D81 images will start with "1581 DISK IMAGE ".
D82 images will start with "8250 DISK IMAGE ".
...and so on.
The remaining 16 bytes should be used to specify the image configuration as clearly as possible. For example, the D64 should have a byte for how many tracks are present (i.e. 35, 40, or some other number), and a byte indicating whether or not error bytes are appended to the end of the image. I would also suggest another byte used to indicate an auxiliary directory track, but that starts to make things complicated.
As I said, this data can be inferred from the image itself, but it is much better to be explicit, and the simplest way to do that is to lead with a short signature block.
Having a "number of tracks" byte could be space-efficient as well, because many images have content much smaller than the disk's capacity; in these cases it would be possible to publish a smaller D64. Since the 18th track is required, the smallest D64 would be 18 tracks long, or about 95k -- almost half the size of the standard D64.
A signature is a small, initial data set which you can use to determine the nature of the disk unconditionally. My suggestion is to look for an optional 32 byte signature on all Commodore images; if it proves useful, then over time all such images will have this signature.
Examples.
D64 images will start with "1541 DISK IMAGE ".
D71 images will start with "1571 DISK IMAGE ".
D81 images will start with "1581 DISK IMAGE ".
D82 images will start with "8250 DISK IMAGE ".
...and so on.
The remaining 16 bytes should be used to specify the image configuration as clearly as possible. For example, the D64 should have a byte for how many tracks are present (i.e. 35, 40, or some other number), and a byte indicating whether or not error bytes are appended to the end of the image. I would also suggest another byte used to indicate an auxiliary directory track, but that starts to make things complicated.
As I said, this data can be inferred from the image itself, but it is much better to be explicit, and the simplest way to do that is to lead with a short signature block.
Having a "number of tracks" byte could be space-efficient as well, because many images have content much smaller than the disk's capacity; in these cases it would be possible to publish a smaller D64. Since the 18th track is required, the smallest D64 would be 18 tracks long, or about 95k -- almost half the size of the standard D64.
Wednesday, March 28, 2012
Java to ActionScript (via Perl)
#!/usr/bin/perl
while(<>)
{
s/\bfinal//;
s/\b(int|long) (\w+)/ var $2:int/;
s/\bboolean (\w+)/ var $1:Boolean/;
s/\bString (\w+)/ var $1:String/;
s/System.out.println/trace/;
s/ (void|int|String) (\w+\(.*?\))/ function $2:$1/;
print;
}
The Commodore 1541 disk drive is a computer, with a 6502 microprocessor and its own RAM. It talks to the Commodore 64 via a hastily-built proprietary serial variant of the IEEE488 bus.
And it's a pain to emulate.
Luckily, it's a solved problem, more or less, if your chosen programming language is C++ or Java. If you want to do it in, say, ActionScript, then you are out of luck.
...unless you know Perl.
ActionScript, as you may know, has a fuzzy relationship with Java. Its compiler is written in Java. Its VM may very well be based on the JVM. So it is no surprise that ActionScript source is in many ways a cipher of Java.
I wrote a very small Perl script to convert Java source to ActionScript source. It doesn't do a 100% job, but in all things the best is the enemy of the good, and the Burrito Principle holds (80% of the meat is in 20% of the burrito). So this gets me most of the way there, leaving small scraps to deal with (instead of facing a complete and more tedious rewrite).
while(<>)
{
s/\bfinal//;
s/\b(int|long) (\w+)/ var $2:int/;
s/\bboolean (\w+)/ var $1:Boolean/;
s/\bString (\w+)/ var $1:String/;
s/System.out.println/trace/;
s/ (void|int|String) (\w+\(.*?\))/ function $2:$1/;
print;
}
The Commodore 1541 disk drive is a computer, with a 6502 microprocessor and its own RAM. It talks to the Commodore 64 via a hastily-built proprietary serial variant of the IEEE488 bus.
And it's a pain to emulate.
Luckily, it's a solved problem, more or less, if your chosen programming language is C++ or Java. If you want to do it in, say, ActionScript, then you are out of luck.
...unless you know Perl.
ActionScript, as you may know, has a fuzzy relationship with Java. Its compiler is written in Java. Its VM may very well be based on the JVM. So it is no surprise that ActionScript source is in many ways a cipher of Java.
I wrote a very small Perl script to convert Java source to ActionScript source. It doesn't do a 100% job, but in all things the best is the enemy of the good, and the Burrito Principle holds (80% of the meat is in 20% of the burrito). So this gets me most of the way there, leaving small scraps to deal with (instead of facing a complete and more tedious rewrite).
Saturday, March 24, 2012
I like Objective C
So far. I'm not sure if it's as accessible as ActionScript, but I really appreciate its strict adherence to Design Patterns. Just a few lessons in, and we've already done MVC and Delegates.
And, of course, I always loved the Smalltalk syntax.
And, of course, I always loved the Smalltalk syntax.
Friday, March 2, 2012
Time for a new internet browser
Time to be an old grump for a moment.
I've said it before, I'll say it again. It's time to rewrite the browser. Invent, create, realize a new way of browsing the internet.
Forget HTML, JavaScript, FlashPlayer et al. Computers are powerful; why aren't browsers? Why can't you develop on the browser the same way you develop directly onto the operating system? Why isn't there a virtual machine to which you may directly target compilers? That way, you have your cake and can eat it, too.
I'm not saying the browser should be an operating system; it's an application. However, it should integrate with operating systems. For example, security is an OS problem; it should not be an application's problem. Why solve the same problem over and over again? There are realtime impacts to this: HTTP and HTTPS are heavy compared to TFTP.
I am saying that HTML is annoying. I don't think HTML5 will solve that problem - at least, it won't solve it anytime soon. HTML is to the browser like Java is to the OS: it's a language, a display and layout language. It defines the View.
I've said it before, I'll say it again. It's time to rewrite the browser. Invent, create, realize a new way of browsing the internet.
Forget HTML, JavaScript, FlashPlayer et al. Computers are powerful; why aren't browsers? Why can't you develop on the browser the same way you develop directly onto the operating system? Why isn't there a virtual machine to which you may directly target compilers? That way, you have your cake and can eat it, too.
I'm not saying the browser should be an operating system; it's an application. However, it should integrate with operating systems. For example, security is an OS problem; it should not be an application's problem. Why solve the same problem over and over again? There are realtime impacts to this: HTTP and HTTPS are heavy compared to TFTP.
I am saying that HTML is annoying. I don't think HTML5 will solve that problem - at least, it won't solve it anytime soon. HTML is to the browser like Java is to the OS: it's a language, a display and layout language. It defines the View.
Thursday, February 23, 2012
CargoCult, part one
This is a post about my dream language, which I've named CargoCult. It's a mashup of Perl, Objective-C, JavaScript, Shell, and other things.
It does NOT eschew the use of shifted characters -- it just requires that they be important, with a value greater than the extra effort of typing shift + something.
Object Notation
CargoCult is a dynamic object language. This means you have type-able structures, potentially dynamic, which have attributes and methods.
Core language features -- arrays, hashes, variables -- are objects. For example, the implicit array type is an object, so you can do things like this:
return [d1, d2, d3].sort.reverse.pop;
Hashes and arrays use the grouping notation of braces. An array is a comma-separated list of scalars. A hash is a comma-separated list of assignments.
my array = 1, 2, 3, 'four'; # also [ 1, 2, 3, 'four' ]
my hash = [year = 2012, month = two, day = 23];
Hash and array accesses are object calls.
my value = hash.year;
my other_value = array.0;
my other_value = array.0;
Method Calling with Parameters
When we write methods in any language, we typically name formal parameters. For example:
string myFunction( foo, bar )
{
foo + ': ' + bar;
}
foo and bar are formal parameters, i.e. the names used in the method.
When calling a method, parameters are passed in by name. In other words, the parameters are more or less a hash.
my str = myObj myFunction .foo 'hello' .bar 'world';
When you have to nest the call, use the backslash to indicate a method call (rather than a new array), and then braces for grouping.
my str = myObj myFunction .foo \[myObj myFunction .foo 'hello' .bar 'world'] .bar '!';
Wednesday, September 7, 2011
Perl, Javascript, and Traveller
For a couple years I've had a suite of Javascript pages for building stuff for Traveller, 5th edition. They use Traveller's new rules for stacking attributes onto a base item, resulting in myriads of armor, weapons, vehicles and smallcraft.
Very recently, I added a backend Perl component which will email the results to you. So say you define a fast civilian grav truck which can reach orbit on your iPad. Just enter your email address and it will email the results to you.
I wrote that bit because I wanted to use my scripts from my iPad, and didn't want to rely on cut-and-paste.
A link to the scripts is here: http://eaglestone.pocketempires.com/scripts/armormaker.html
Very recently, I added a backend Perl component which will email the results to you. So say you define a fast civilian grav truck which can reach orbit on your iPad. Just enter your email address and it will email the results to you.
I wrote that bit because I wanted to use my scripts from my iPad, and didn't want to rely on cut-and-paste.
A link to the scripts is here: http://eaglestone.pocketempires.com/scripts/armormaker.html
Wednesday, July 20, 2011
ANT is insane
So I'm learning how to use ANT. Or, rather, I'm learning the insane limitations of ANT tasks.
Here's how it works. Every ANT task has its own API, its own behavior, and a very limited way of operating.
Example: foreach. I cannot pass in a list of module names, referenced from an XML Property file, into foreach for processing. Can't be done. Why? Because the coder didn't code that in.
But why did the coder have to specify the types of data a foreach statement could take? Why is a generic iterator required to go beyond iteration? And yet foreach does; apparently it must, presumably because ANT has insufficient expression power.
If that's a fundamental limitation of the ANT core system, then why didn't the original writer write ANT in a more generic fashion? This blows me away. It's insane.
If we're going to create a de facto script language out of XML, let's do it right, folks.
(1) Accept a LIST, and nothing more. If this means ANT needs a foundational LIST type, then so be it. If this means foreach needs to be rewritten to accept a LIST and only a LIST, then so be it. To do anything else is INSANE.
(2) Define tasks to be performed inside the foreach. After all, this is control flow. Don't make me perform an assembly-language-like JUMP. Do It Right.
Here's how it works. Every ANT task has its own API, its own behavior, and a very limited way of operating.
Example: foreach. I cannot pass in a list of module names, referenced from an XML Property file, into foreach for processing. Can't be done. Why? Because the coder didn't code that in.
But why did the coder have to specify the types of data a foreach statement could take? Why is a generic iterator required to go beyond iteration? And yet foreach does; apparently it must, presumably because ANT has insufficient expression power.
If that's a fundamental limitation of the ANT core system, then why didn't the original writer write ANT in a more generic fashion? This blows me away. It's insane.
If we're going to create a de facto script language out of XML, let's do it right, folks.
(1) Accept a LIST, and nothing more. If this means ANT needs a foundational LIST type, then so be it. If this means foreach needs to be rewritten to accept a LIST and only a LIST, then so be it. To do anything else is INSANE.
(2) Define tasks to be performed inside the foreach. After all, this is control flow. Don't make me perform an assembly-language-like JUMP. Do It Right.
<foreach list="${my.modules}" param="item">
<compile sourcePath="${item.sourcePath}"
destPath="${item.destPath}"/>
</foreach>
Friday, July 15, 2011
i64 Diskette Image header
A "D64 file" is an image file of a real Commodore 1541 diskette. D64s, and related formats, have been around since the 90s. In a previous post, I explained that it might be nice to see a more flexible, parametric approach to Commodore disk images.
I'm currently calling this format "i64", although I haven't fully settled on it.
An old draft of the format is here.
In short, it's simply an optional header block, with parametric data explaining the structure of the image and important offsets.
It's present in any disk image which is of a non-standard size. For example, if a .d81 file is not the correct size for a d81 image, then look at the first block for the i64 custom header. If it's present, it's located in the first 256 bytes of the disk image -- where a header ought to be located. Otherwise, drive on as usual.
This fancy new header tells us very explicit things about the image.
Structure is 8 bytes, and lays out the four zones of every Commodore disk: the number of tracks per zone, and the number of sectors per track within that zone. Also whether the disk is double-sided (thereby duplicating the zone data for side 2).
Error data can either be prepended to the actual tracks, or appended.
The locations of the disk header block, directory and BAM are required. Offsets to the header and BAM data are required, and how the BAM is stored in the case of dual-sided disks, or indeed if the BAM exists at all.
An autoboot BAM offset, if any. A boot track, if any.
Interleaves.
The disk's default format type and DOS type.
Whether the REL field is used for the LSU.
Whether the file's timestamp is stored in the directory entry.
Whether the Directory is allowed to freely grow and range, like any other file.
Whether the image is allowed to grow dynamically, or is pre-allocated on creation.
I'm currently calling this format "i64", although I haven't fully settled on it.
An old draft of the format is here.
In short, it's simply an optional header block, with parametric data explaining the structure of the image and important offsets.
It's present in any disk image which is of a non-standard size. For example, if a .d81 file is not the correct size for a d81 image, then look at the first block for the i64 custom header. If it's present, it's located in the first 256 bytes of the disk image -- where a header ought to be located. Otherwise, drive on as usual.
This fancy new header tells us very explicit things about the image.
Structure is 8 bytes, and lays out the four zones of every Commodore disk: the number of tracks per zone, and the number of sectors per track within that zone. Also whether the disk is double-sided (thereby duplicating the zone data for side 2).
Error data can either be prepended to the actual tracks, or appended.
The locations of the disk header block, directory and BAM are required. Offsets to the header and BAM data are required, and how the BAM is stored in the case of dual-sided disks, or indeed if the BAM exists at all.
An autoboot BAM offset, if any. A boot track, if any.
Interleaves.
The disk's default format type and DOS type.
Whether the REL field is used for the LSU.
Whether the file's timestamp is stored in the directory entry.
Whether the Directory is allowed to freely grow and range, like any other file.
Whether the image is allowed to grow dynamically, or is pre-allocated on creation.
The future of the D64 format
The Commodore 64 is an ancient 8-bit system, enjoying a niche fanbase of aging gen-Xers and the youngest Boomers. Emulators are sophisticated and sufficient. You can run games off of a Commodore disk drive attached to your Mac or PC or Linux box. You can run games off of a Commodore computer attached to your home computer acting like a disk drive. You can buy SD-card-reading hardware that replaces a Commodore disk drive. You can buy a joystick that has an embedded C64 burned into its chips. Commodore diskette and hard drive images are insignificantly tiny, compared to today's storage capacities.
What is there to improve on, and why bother?
The second question is the easiest to answer: because the problem space is interesting and well-scoped.
The first question requires creativity. However, I do have some ideas.
Networked "UDP1541"
Current systems bundle all software into one application. Your C64 emulator includes disk and tape support at some level. Some are very sophisticated.
IEC emulation requires effort. It's painful. So I say, if someone has gone to the effort, make it shareable: split the disk drive into a separate application, and communicate via a primitive IEC-friendly protocol over UDP. Not only would this be a fun project, it would also allow emulators to use these "devices" even if they were programmed on a completely different platform. The FC64 AIR app could use the VICE UDP1541.
If you really must, then write it as TCP1541, i.e. using TCP/IP instead of UDP. In this way the C64 emulators may access Commodore drives located on any server anywhere on the internet.
3rd Generation Image Support
The venerable D64 works for 99% of all emulator needs. The G64 works for the remaining 1%. So why is another format needed?
Think of it the other 'way 'round. Emulated drives are common. The D64 format is somewhat irrelevant; it's the emulation support that's more important. An opportunity arises to support the 1541's quirks while not being constrained by it.
So rather than a new format, I suggest that disk drive emulators should be highly parameterized, and tweaked to read those parameters from disk images when present.
I see three areas of compatability: (1) small programs, (2) large programs, and (3) experimentation.
First, MY Solution
My solution is an image header block: the first block, if identified by an identification label and image configuration parameters, would not only tell any reader exactly where all data is located, but also the structure of the data itself -- for example, the number of tracks in each of the four zones, how the BAM is stored in special cases, if error bytes are prepended or appended, if the image is allocated fully or is permitted to grow, and so on. This information and more sits easily within one block of data, and essentially paramaterizes a disk image reader.
I think the best-case scenario would be for disk drive emulators to look for this block on any mounted image that's not the correct size for a typical image file.
1. Small Programs
Where you have small programs, and yet want to remain compatible with a disk-image milieu, it would be nice to have smaller disk images. This argues for a flexible file format at least. One way to do this is to reorganize the disk tracks, so that the header+directory track comes first, and the remaining tracks are added only as needed, in write-order, based on an explicit mapping.
Of course, if you have the flexible image format, all you have to do is define the format, and you're done.
2. Large Programs
The other potential gain is in large, multi-disk programs. In this case, it would be nice to be able to store more than 174k in one image. You could define a larger disk image in some cases.
However, I think a better solution in many (but not all) cases is for emulators to understand the TAR format, and use archiving to group related disks together.
3. Experimentation
This to me is the funnest part. If a disk reader is highly parameterized, there are lots of custom images you can make, and you'd be free to explore the space without worrying about support.
I have custom image formats that I've played with, and found that a little bit of parametric data goes a long way. While not immediately practical, there is potential for interesting formats. Perhaps this is the best way to archive programs, too: let the file size dictate how many zones, how many tracks, how many sectors to have.
What is there to improve on, and why bother?
The second question is the easiest to answer: because the problem space is interesting and well-scoped.
The first question requires creativity. However, I do have some ideas.
Networked "UDP1541"
Current systems bundle all software into one application. Your C64 emulator includes disk and tape support at some level. Some are very sophisticated.
IEC emulation requires effort. It's painful. So I say, if someone has gone to the effort, make it shareable: split the disk drive into a separate application, and communicate via a primitive IEC-friendly protocol over UDP. Not only would this be a fun project, it would also allow emulators to use these "devices" even if they were programmed on a completely different platform. The FC64 AIR app could use the VICE UDP1541.
If you really must, then write it as TCP1541, i.e. using TCP/IP instead of UDP. In this way the C64 emulators may access Commodore drives located on any server anywhere on the internet.
3rd Generation Image Support
The venerable D64 works for 99% of all emulator needs. The G64 works for the remaining 1%. So why is another format needed?
Think of it the other 'way 'round. Emulated drives are common. The D64 format is somewhat irrelevant; it's the emulation support that's more important. An opportunity arises to support the 1541's quirks while not being constrained by it.
So rather than a new format, I suggest that disk drive emulators should be highly parameterized, and tweaked to read those parameters from disk images when present.
I see three areas of compatability: (1) small programs, (2) large programs, and (3) experimentation.
First, MY Solution
My solution is an image header block: the first block, if identified by an identification label and image configuration parameters, would not only tell any reader exactly where all data is located, but also the structure of the data itself -- for example, the number of tracks in each of the four zones, how the BAM is stored in special cases, if error bytes are prepended or appended, if the image is allocated fully or is permitted to grow, and so on. This information and more sits easily within one block of data, and essentially paramaterizes a disk image reader.
I think the best-case scenario would be for disk drive emulators to look for this block on any mounted image that's not the correct size for a typical image file.
1. Small Programs
Where you have small programs, and yet want to remain compatible with a disk-image milieu, it would be nice to have smaller disk images. This argues for a flexible file format at least. One way to do this is to reorganize the disk tracks, so that the header+directory track comes first, and the remaining tracks are added only as needed, in write-order, based on an explicit mapping.
Of course, if you have the flexible image format, all you have to do is define the format, and you're done.
2. Large Programs
The other potential gain is in large, multi-disk programs. In this case, it would be nice to be able to store more than 174k in one image. You could define a larger disk image in some cases.
However, I think a better solution in many (but not all) cases is for emulators to understand the TAR format, and use archiving to group related disks together.
3. Experimentation
This to me is the funnest part. If a disk reader is highly parameterized, there are lots of custom images you can make, and you'd be free to explore the space without worrying about support.
I have custom image formats that I've played with, and found that a little bit of parametric data goes a long way. While not immediately practical, there is potential for interesting formats. Perhaps this is the best way to archive programs, too: let the file size dictate how many zones, how many tracks, how many sectors to have.
Tuesday, June 14, 2011
How to parse D64 files, part 2: file chains
Last time, we saw how to crack open a D64 file with Perl and get at the first directory block. You may recall, two of the bytes in a directory entry point to the first block of the file proper, by way of a Track number and Sector number.
Now let's see how to read files in general.
A file in a Commodore disk image is stored in blocks. Those blocks may be scattered across the disk; in fact, due to mechanical considerations, files were almost never written in a contiguous set of blocks. Instead, a block was written, then the next block was written a few sectors away, then nthe next a few more sectors away, and so on.
When a block is read, the next block's location is found in the first two bytes. The remaining 254 bytes are file data proper.
So then, suppose a directory entry indicates that a file begins at track $t, sector $s. Using Perl, the file could be reconstructed in a manner similar to this:
my $fileData = readFile( $startTrack, $startSector );
readFile( $buffer, $t, $s )
{
return $buffer unless $t;
my $byteOffset = 256 * ($sectorOffset[ $t ] + $s);
($t, $s) = unpack "CC", substring( $diskImage, $byteOffset, 2 );
$buffer .= unpack "C*", substring( $diskImage, $byteOffset + 2, 254 );
return readFile( $buffer, $t, $s );
}
@sectorOffset will require some more explanation.
Now let's see how to read files in general.
A file in a Commodore disk image is stored in blocks. Those blocks may be scattered across the disk; in fact, due to mechanical considerations, files were almost never written in a contiguous set of blocks. Instead, a block was written, then the next block was written a few sectors away, then nthe next a few more sectors away, and so on.
When a block is read, the next block's location is found in the first two bytes. The remaining 254 bytes are file data proper.
So then, suppose a directory entry indicates that a file begins at track $t, sector $s. Using Perl, the file could be reconstructed in a manner similar to this:
my $fileData = readFile( $startTrack, $startSector );
readFile( $buffer, $t, $s )
{
return $buffer unless $t;
my $byteOffset = 256 * ($sectorOffset[ $t ] + $s);
($t, $s) = unpack "CC", substring( $diskImage, $byteOffset, 2 );
$buffer .= unpack "C*", substring( $diskImage, $byteOffset + 2, 254 );
return readFile( $buffer, $t, $s );
}
@sectorOffset will require some more explanation.
Monday, June 13, 2011
How to parse D64 files, part 1
This is a short how-to for parsing commodore 1541 diskette images, colloquially called D64 files after their 3-character suffix.
These instructions assume you know how to program in at least one C-derived programming language. I will be using Perl, but it will be very C-like.
STEP ONE: install Perl. A good and free distribution can be had over at http://activestate.org.
STEP TWO: fetch some D64 files. One likely place is http://lemon64.com.
Now it's time to write some code. D64 files are simply data laid out the way a 1541 would see it as it reads it straight from the disk, from the beginning to the end. So the code will have to navigate the structure of the 1541 diskette format. First though, let's slurp the entire image into a buffer..
Okay, we've got the disk in a buffer. Now what? Now we wrest the structure from this pile of bytes. That structure begins with the DIRECTORY. The directory offset is 18 * 21 + 1 blocks in, and a block is 256 bytes. So let's put that in a variable for later.
Now we need to parse the actual data from the directory. The directory block consists of eight entries of 32 bytes each. Each byte has a meaning, some point to locations on the disk, some are part of a filename, some are status bytes for the file, etc. I will iterate over the directory block, 32 bytes at a time, and at each iteration unpack some of the current 32 byte structure into its component values:
One very important pair of data in the above structure are $track and $sector. They tell us where to find the first block of that file.
This is a good stopping point. We've taken a D64 file, read it in, and printed out the contents of the first directory sector -- I.e. The first 256-byte block of the directory. Next time, we'll see how to read in an entire file.
These instructions assume you know how to program in at least one C-derived programming language. I will be using Perl, but it will be very C-like.
STEP ONE: install Perl. A good and free distribution can be had over at http://activestate.org.
STEP TWO: fetch some D64 files. One likely place is http://lemon64.com.
Now it's time to write some code. D64 files are simply data laid out the way a 1541 would see it as it reads it straight from the disk, from the beginning to the end. So the code will have to navigate the structure of the 1541 diskette format. First though, let's slurp the entire image into a buffer..
my $filename = 'whatever.d64';
my $filesize = -s $filename;
open IN, $filename;
binmode IN;
my $buffer;
read( IN, $buffer, $filesize );
close IN;
Okay, we've got the disk in a buffer. Now what? Now we wrest the structure from this pile of bytes. That structure begins with the DIRECTORY. The directory offset is 18 * 21 + 1 blocks in, and a block is 256 bytes. So let's put that in a variable for later.
my $directorySector = (18-1) * 21 + 1;
my $offset = 256 * $directorySector;
Now we need to parse the actual data from the directory. The directory block consists of eight entries of 32 bytes each. Each byte has a meaning, some point to locations on the disk, some are part of a filename, some are status bytes for the file, etc. I will iterate over the directory block, 32 bytes at a time, and at each iteration unpack some of the current 32 byte structure into its component values:
for ( my $j=0; $j<256; $j+=32 )
{
my ($dirtrack, # unsigned Char
$dirsector, # unsigned Char
$type, # unsigned Char
$track, # unsigned Char
$sector, # unsigned Char
$filename) # 16-character ascii string
= unpack 'CCCCCa16', substring( $buffer, $offset + $j );
print "$filename [$type] is located at $track/$sector\n";
}
One very important pair of data in the above structure are $track and $sector. They tell us where to find the first block of that file.
This is a good stopping point. We've taken a D64 file, read it in, and printed out the contents of the first directory sector -- I.e. The first 256-byte block of the directory. Next time, we'll see how to read in an entire file.
Friday, May 27, 2011
Main Points from Adobe
So I attended a day-long presentation on Acrobat and Creative Suite 5.5 recently. Here are the primordial take-aways I got out of it:
* mobile is "big" for Adobe (now). Really!
* the typical information worker spends 17 hours a week *creating* content.
* rich media will be 25% of content by 2013.
* HTML5 is not a document format.
^ That's something which occurred to me during the Acrobat-portion of the presentation.
* process improvements and reducing costs are the top priorities in companies.
* process improvement leads to productivity gain.
* reduced costs are achieved via standards, best practices.
* that's what drove Adobe's work on Acrobat X. Editing content, table data, headers, OCR, sharepoint integration, highliter and sticky-notes, PPT to PDF, embedded QT and MPG, commenting tools, forms, digital signatures, document comparison, a macro system, legal and governmental electronic document standards support, et al.
That's it in a nutshell.
Oh yes: Flex/ActionScript content is maybe 10% of Adobe's business.
* mobile is "big" for Adobe (now). Really!
* the typical information worker spends 17 hours a week *creating* content.
* rich media will be 25% of content by 2013.
* HTML5 is not a document format.
^ That's something which occurred to me during the Acrobat-portion of the presentation.
* process improvements and reducing costs are the top priorities in companies.
* process improvement leads to productivity gain.
* reduced costs are achieved via standards, best practices.
* that's what drove Adobe's work on Acrobat X. Editing content, table data, headers, OCR, sharepoint integration, highliter and sticky-notes, PPT to PDF, embedded QT and MPG, commenting tools, forms, digital signatures, document comparison, a macro system, legal and governmental electronic document standards support, et al.
That's it in a nutshell.
Oh yes: Flex/ActionScript content is maybe 10% of Adobe's business.
Friday, May 6, 2011
Modern Perl
I've been talking with an acquaintance the other day about Modern Perl -- i.e. the current way of coding Perl.
Perl's very flexibility is evident, since code written in 1994's Perl will still run on 2011's Perl, and yet you can do things today that you couldn't do back then.
This is due to at least two things:
(1) there's an active core development of Perl, which extends the language in useful but backwards-compatible ways (this is possible partly due to the power, flexibility, and dynamism of Perl itself),
and
(2) there's active module development by the community to meet the programming needs of the community. CPAN continues to astound me. There is nothing with the scope and depth of CPAN anywhere for any other programming language.
Languages like Scheme and Python will insist on a core orthogonal operating set. And with some languages (like SmallTalk), you can go a long way on that. But let's face it, you have to get work done sooner or later. Even Common LISP understood that. Perl's amazing library of modules lets you do anything you like... even emulate and embed other languages.
Say you've got a relatively clean, small (and therefore embeddable), C-like language which includes hashtables and some improved control structures, used for scripting small actions. As a loose superset of these sorts of languages, Perl is ideal for aggregating and linking libraries, finding and factoring out common code fragments, and even generating the code and acting as a test harness. In fact, Perl modules could exist today for performing most of these functions -- a Git module could double as a library front end; a text tool could find redundant code; and a template engine could generate code from a JSON structure.
You can't do that with Lua itself. But you could do it all with Perl. And since Perl's syntax descends from C, it's easy to pick up Perl as another tool in your belt.
Perl's very flexibility is evident, since code written in 1994's Perl will still run on 2011's Perl, and yet you can do things today that you couldn't do back then.
This is due to at least two things:
(1) there's an active core development of Perl, which extends the language in useful but backwards-compatible ways (this is possible partly due to the power, flexibility, and dynamism of Perl itself),
and
(2) there's active module development by the community to meet the programming needs of the community. CPAN continues to astound me. There is nothing with the scope and depth of CPAN anywhere for any other programming language.
Languages like Scheme and Python will insist on a core orthogonal operating set. And with some languages (like SmallTalk), you can go a long way on that. But let's face it, you have to get work done sooner or later. Even Common LISP understood that. Perl's amazing library of modules lets you do anything you like... even emulate and embed other languages.
Say you've got a relatively clean, small (and therefore embeddable), C-like language which includes hashtables and some improved control structures, used for scripting small actions. As a loose superset of these sorts of languages, Perl is ideal for aggregating and linking libraries, finding and factoring out common code fragments, and even generating the code and acting as a test harness. In fact, Perl modules could exist today for performing most of these functions -- a Git module could double as a library front end; a text tool could find redundant code; and a template engine could generate code from a JSON structure.
You can't do that with Lua itself. But you could do it all with Perl. And since Perl's syntax descends from C, it's easy to pick up Perl as another tool in your belt.
Thursday, May 5, 2011
Perl Programming
I'm a Perl fanatic. Ever since I learned of it waaay back in 1995, I've been able to do amazing things with it, primarily in my job function as a programmer.
As a programmer, I face two daily tasks. First is writing code to accomplish something for the company, i.e. business logic. Web pages for example. The other task is managing piles and piles of data that a company has to manage.
1. Programming requires writing code. Believe it or not, at every job I have used Perl at one time or another to write code for me. Perl is wonderful for code generation. The code I've generated the most of using Perl is: Java.
2. Data is strewn in multiple formats in multiple places. And Perl is supreme at data mining. Whether I'm scraping web pages or tossing binary digits, Perl is awesome.
So if you have to handle data, I suggest Perl is the best fit for the job. It's not that you can't do it with other languages (I've used, and still use, Java, Python, C, AWK, SmallTalk, JavaScript, ActionScript, Ruby, Flex, and LISP); it's just that Perl is handier.
As a programmer, I face two daily tasks. First is writing code to accomplish something for the company, i.e. business logic. Web pages for example. The other task is managing piles and piles of data that a company has to manage.
1. Programming requires writing code. Believe it or not, at every job I have used Perl at one time or another to write code for me. Perl is wonderful for code generation. The code I've generated the most of using Perl is: Java.
2. Data is strewn in multiple formats in multiple places. And Perl is supreme at data mining. Whether I'm scraping web pages or tossing binary digits, Perl is awesome.
So if you have to handle data, I suggest Perl is the best fit for the job. It's not that you can't do it with other languages (I've used, and still use, Java, Python, C, AWK, SmallTalk, JavaScript, ActionScript, Ruby, Flex, and LISP); it's just that Perl is handier.
Wednesday, March 3, 2010
My Data-Only Object Notation Converter
Here's my home-rolled method for serializing a data-only object to PON (Perl Object Notation) or JSON. "Data-Only" means that the object only contains scalars, hashes, and arrays -- no classes.
// to output JSON, pass in a pair of ":" instead of "=>".
private function encodePON( obj:Object, pair:String="=>" ):String
{
if ( obj is String ) return "'" + obj + "'";
if ( obj is Number ) return obj.toString();
var out:String = "";
var ary:Array;
if ( obj is Array )
{
out += "[";
ary = new Array();
for each (var item:Object in obj as Array)
ary.push( encodePON( item ) );
out += ary.join( ", " );
out += "]";
}
else if ( obj is Object )
{
out += "{";
ary = new Array();
for (var key:String in obj)
ary.push( "'" + key + "' " + pair + " " + encodePON( obj[key] ) );
out += ary.join( ", " );
out += "}";
}
return out;
}
Tuesday, February 16, 2010
Programming Community HotList for February 2010
Need a skill for a programming job? Here's your most likely "best" choices:25% - Java
17% - JavaScript
12% - VisualBasic/C#
10% - PHP
10% - Ruby
8% - ActionScript
7% - Perl
5% - C/C++
5% - Python
2% - Delphi/Pascal
Feel free to quote me on that :)
Monday, May 11, 2009
Comprehensive Ruby Archive Network
As I was posting on my Flex blog, I started to wonder if there's a design assumption going on that I hadn't considered.
I was thinking that Ruby could use a repository where individuals can share modules. Basically, a large collection of software and documentation.
Or do we?
Take Perl. Do you want to turn your Perl script into a CGI system? Go to CPAN -- the Comprehensive Perl Archive Network -- and grab the CGI module (okay, bad example, it's included in the standard distribution, but you get the idea). CPAN has bazillions of modules, and is an invaluable resource for handling whatever you need to handle. Proofs of concepts get hammered out, new ideas get born, and thoughtful robust code is freely available. All in one practical place.
But, perhaps Ruby has a higher level of focus. I see RubyForge: support for "projects", i.e. full applications, hinting that the modules there require just a bit too much cohesion to be generally useful. Instead of plug-in modules that you can use to add functionality to an application, does Ruby tend to focus on the applications (or projects) themselves?
I was thinking that Ruby could use a repository where individuals can share modules. Basically, a large collection of software and documentation.
Or do we?
Take Perl. Do you want to turn your Perl script into a CGI system? Go to CPAN -- the Comprehensive Perl Archive Network -- and grab the CGI module (okay, bad example, it's included in the standard distribution, but you get the idea). CPAN has bazillions of modules, and is an invaluable resource for handling whatever you need to handle. Proofs of concepts get hammered out, new ideas get born, and thoughtful robust code is freely available. All in one practical place.
But, perhaps Ruby has a higher level of focus. I see RubyForge: support for "projects", i.e. full applications, hinting that the modules there require just a bit too much cohesion to be generally useful. Instead of plug-in modules that you can use to add functionality to an application, does Ruby tend to focus on the applications (or projects) themselves?
Subscribe to:
Posts (Atom)