Parsing Appleworks and Clarisworks file formats

Over the past few years, when I have downtime, I sometimes like to reverse engineer abandoned file formats. It is kind of like working on a crossword puzzle with the bonus that any progress you make helps people out there who are trying to archive, index, or convert their old files.

output of hex fiend comparing two files

I’ve spent a lot of time trying to figure out file format for Appleworks and Clarisworks. My latest efforts have been to take a file, make a small change, then use Hex Fiend to compare what has changed in the binary format.

After years of off and on tinkering and documenting I finally wrote a basic parser for Appleworks and Clarisworks word processor files. I ‘believe’ this is the first free and open parser for this file format, even if it is ten years too late. I figured out a lot about the format, but it still has a long way to go. You can view my current documenting status here and download source for the parser on GitHub.

The parser so far can read:

  • document version
  • page size
  • margins
  • document content

output of parse

From what I have seen, most people trying to read Appleworks documents only really care about the document content, but I am very close to figuring out how to parse:

  • styles – (bold, italic, underline)
  • footnotes

I may not touch it again for another year, but who knows.

 Link to file format research

 Download source code at GitHub

Be Sociable, Share!
  • Hans Schmidt

    Hi,
    HEUREKA! (“I found it!”) I believe I found where the content starts. Send me a mail (hopefully you can see my mail) and I will tell you some stuff. I started to develop something similar, and I have access to ClarisWorks 5 Win with a converter to 2 and 3 (same file formats) and thus more simple.

    just a hint: the content starts after the last 0000 ????h after the first DSET (in ClarisWorks 2). The stuff between the DSET and the content has something to do with the boldings etc. but also with the length of the text (I tested multiple variants of only a, only aa, aaa and so on)

    I like to create a import tool for LibreOffice on a long term. You already did a great job, so let us combine our works and develop an even better world. ;-)

    If you want I can send you files with content of your wish (in different versions)

    Moreover: it seems that a) the password protection isn’t available in 4 and 3 (so and in 2) and it is simply getting lost using the official converter without any message!

    Regards,
    Dennis (search at the TDF wiki if you can’t see my mail address)

  • teacurran

    Dennis,
    Yes, I know where the content starts and ends. My focus right now is on formatting. I have Appleworks 5 on Windows. Clarisworks 1, Claris 5, Appleworks 6 on Mac.

    Check out the GitHub, I just checked in code that converts the body of the document and saves it to Open Document Format.

    If you have any additional information about the format I would love to talk to you about it.  Feel free to edit my Wiki or fork and add to the GitHub.

    yeah, as I noted in my wiki, password protection was added in v5, but the document isn’t encrypted so you can just ignore the password and extract the text without it.

    What is the TDF wiki? document foundation? is there Appleworks related text there?