Parsing Appleworks and Clarisworks file formats

Over the past few years, when I have downtime, I sometimes like to reverse engineer abandoned file formats. It is kind of like working on a crossword puzzle with the bonus that any progress you make helps people out there who are trying to archive, index, or convert their old files.

output of hex fiend comparing two files

I’ve spent a lot of time trying to figure out file format for Appleworks and Clarisworks. My latest efforts have been to take a file, make a small change, then use Hex Fiend to compare what has changed in the binary format.

After years of off and on tinkering and documenting I finally wrote a basic parser for Appleworks and Clarisworks word processor files. I ‘believe’ this is the first free and open parser for this file format, even if it is ten years too late. I figured out a lot about the format, but it still has a long way to go. You can view my current documenting status here and download source for the parser on GitHub.

The parser so far can read:

  • document version
  • page size
  • margins
  • document content

output of parse

From what I have seen, most people trying to read Appleworks documents only really care about the document content, but I am very close to figuring out how to parse:

  • styles – (bold, italic, underline)
  • footnotes

I may not touch it again for another year, but who knows.

 Link to file format research

 Download source code at GitHub