Parsing Appleworks and Clarisworks file formats

Thursday, January 17th, 2013

Over the past few years, when I have downtime, I sometimes like to reverse engineer abandoned file formats. It is kind of like working on a crossword puzzle with the bonus that any progress you make helps people out there who are trying to archive, index, or convert their old files.

output of hex fiend comparing two files

I’ve spent a lot of time trying to figure out file format for Appleworks and Clarisworks. My latest efforts have been to take a file, make a small change, then use Hex Fiend to compare what has changed in the binary format.

After years of off and on tinkering and documenting I finally wrote a basic parser for Appleworks and Clarisworks word processor files. I ‘believe’ this is the first free and open parser for this file format, even if it is ten years too late. I figured out a lot about the format, but it still has a long way to go. You can view my current documenting status here and download source for the parser on GitHub.

The parser so far can read:

  • document version
  • page size
  • margins
  • document content

output of parse

From what I have seen, most people trying to read Appleworks documents only really care about the document content, but I am very close to figuring out how to parse:

  • styles – (bold, italic, underline)
  • footnotes

I may not touch it again for another year, but who knows.

 Link to file format research

 Download source code at GitHub

Java keystore cert import on OSX Leopard

Wednesday, March 17th, 2010

This morning I needed to connect IDE (IntelliJ Idea) to a FishEye/Jira server that had a self signed security certificate. Since IntelliJ (or at least the Atlassian plugin) uses Java to connect to https, it fails because of the JVM’s strict security checking.

Normally when this happens, it is just a matter of installing the certificate into the JVM keystore. There is an article and code that does this for you here. This blog post even has a nice bash wrapper that will download and compile this code for you on OSX.

When I tried to do this today, I got this error every time I tried to run this tool:


After a lot of digging on google. I finally found the problem.

On Java for OSX 10.6 u1 and 10.5 u6, Apple changed the default keystore password from ‘changeit’ to ‘changeme’.

Such a trivial change, but annoying because changeit had been the Sun default forever. There is a funny post on the Apple Java mailing list where an engineer at Apple apologized and just sort of said they didn’t think it would be a big deal for anyone.

This post from Matt Fleming, has some more info as well as how to change the keystore password if you decide you don’t like this change:


Monty Widenius is trying to regain control of MySQL and why this is bad for OSS

Saturday, January 2nd, 2010

One of the most widely discussed topics to go around the tech industry last year was the Oracle acquisition of Sun and what this meant for the MySQL database. This topic held up the merger with the US DOJ and currently has it stalled in the EU commission.

One of the primary forces behind these hold ups is a series of FUD articles written by Monty Widenius, the most recent just a few days ago. Monty has a huge following so whenever he writes up one of these articles it gets huge circulation and riles up the Slashdot and LAMP crowds.

I think that the open source community should be very skeptical about anything written by Monty on this topic, and should start looking at the big picture of what this merger means for themselves and the various players involved.

I don’t know why I haven’t been seeing many serious rebuttals to Michael’s posts. I can only guess that is because everyone working at Sun and Oracle are prohibited from speaking up on the matter.

Here is what I think everyone should consider:

1. Sun is the largest contributor to Open Source in the world

2. Java, which sun is responsible for, one of the largest ecosystem of open source software in the world.

IBM, RedHat, the Apache Foundation, Oracle, Google, and hundreds of other companies have based themselves on Java. Java is by far the most used platform out there today. Out of this wide adoption has sprung a massive open source ecosystem that can only be rivaled by Linux. I don’t have any studies but I wouldn’t be surprised if there was much more open source Java code out there than C.

The Java community in my experience, by and large, is very reluctant to touch anything that is not open source. In the past 10 years the community has moved from expensive application servers and IDEs to free alternatives. Projects like JBoss, Glassfish, Tomcat, Eclipse, and Netbeans are the dominant players in the space and have been driving the mindset that to be a player in this market you have to be free.

3. Sun is in trouble and risks going out of business if no-one buys it. Talks with IBM broke down, and there aren’t many other companies that can make a purchase.

Not much else to say here. Oracle has been having some rough years. Hardware sales are down and they’ve been spending too much on R&D. Sun needs someone to buy them and buy them quickly. They have been actively been doing layoffs that affect all of their open source efforts (including MySQL) while this drags out. Further delays, or the blocking of this merger will only further harm the OSS Community.

4. Oracle already owns Berkely DB and InnoDB, which current versions of MySQL rely on.

This same sort of noise and FUD was made years ago when Oracle bought these two products. Oracle has continued to maintain these and has been a more stable steward of the projects than when they were independent.

5. It doesn’t make business sense for Oracle to try to kill MySQL.

First off – MySQL does not compete with Oracle Database. Anyone who thinks it does does not understand what Oracle Database is. People that use Oracle tend to buy the whole oracle package (DB, App Server, IDE, Middleware, etc..). There are free alternatives to everything in this stack, including many products owned by Oracle but companies that want Oracle are companies that want the piece of mind that support for the stack brings.

There are no CTOs out there hemming and hawing about wether to use MySQL or Oracle. It would be like sitting around and trying to decide if you were going to buy a Ford Focus or an M1 Abrams Tank. I’m not using this analogy to point out that Oracle has many more features (it does) or that it is better than MySQL, only that it is different. You would never buy the tank to commute to work or for most of your driving needs. The same is true with MySQL, it is perfect for most projects and Oracle tends to be a little too heavyweight.

Furthermore, for companies that do insist on purchasing the Oracle stack but want to use MySQL would now be able to buy the support stack with MySQL in it. Oracle can now sell the complete support package and their customers can feel good about getting everything from one vendor. Companies that buy Oracle are most likely the companies that would be paying for MySQL support as well. If a customer comes to oracle, what do they care which database the customer wants to use when they own both.

The last thing Oracle would want to do is alienate a large developer community. Changing anything about MySQL would hugely upset not just the LAMP and Java communities but just about everyone on the planet. This is just bad business.

6. All Oracle will own is a trademark and some engineers.

The source code for MySQL is already free. Anyone can fork it off and start another project and attempt to gain community support around their new project. The only thing they can’t do is call it MySQL. Monty has already started one such fork called MariaDB.

Open source projects are about the community rallying around ideas, not around companies. Monty argues that a forked product could never compete with MySQL without the name recognition. This isn’t true. If the community feels that Oracle is doing a bad job as a maintainer, but someone else is releasing new features on some other project, people will switch very quickly. We see this all the time in the Linux world with the community switching from one fork to another of a large project.

Monty says that forks can never happen because they would need funding and resources. What this argument ignores is that large companies with a lot invested in MySQL could step up if the project is faltering. If Oracle were to stop releasing updates, do you think Google is going to sit around and do nothing? The community would jump ship to GoogSQL or whatever if it came to that and was seen as a better product.

6. Monty Widenius has the most to gain from Oracle divesting in MySQL.

Like everything else in the world, when there is an argument, you need to step back and ask yourself who has the most to gain. Widenius sold MySQL for a hefty personal gain and is now trying to wrestle back control by spreading fear throughout the MySQL community.

Monty has been making noise since September 2008 (before the announced Oracle-Sun merger) and complaining about the direction of the project. He didn’t feel Sun was doing a good job and started immediately calling for forks and and a change of direction. The community heard Widenius out but didn’t build up a ton of support for his ideas, because by and large, most people are satisfied with the job Sun is doing.

Right before the Sun ownership, MySQL was in the process of rolling out a non-free enterprise edition and telling people that they would have to pay for new features. The company I was working for at the time had MySQL sales reps and consultants flat out tell us that we would need to purchase a support agreement if we wanted to use the Falcon Engine or clustering past v5.5. Sun put a stop to this.

In Summary

I am not under any illusions that an Oracle-Sun merger would be all sunshine and roses. I think that Sun has developed a culture and business model around everything being open and free and Oracle has not. Oracle will need to make some big changes about how it does business in order for the merger to work.

I would prefer if Sun could remain an independent company but we have to face facts, Sun is in trouble and there aren’t many other companies that can bail them out. If Sun is allowed to continue it’s downward spiral then we are facing a great loss to the open source community.

I am not worried about the Oracle stewardship of any of Sun’s open source products precisely because of the community support. Oracle can’t afford to make huge changes and alienate hundreds of thousands of developers who have some say in how much money their companies give to Oracle. Doing something like killing or even changing GlassFish, VirtualBox, ZFS, or any other OSS project could lead to less sales in it’s existing stack of software.

Picture Pump

Monday, April 6th, 2009

About five years ago Ben Sisto had an idea for an application that I thought would be useful. The idea being that if you want to download images from Google image search — it takes forever to go through the results, wait for them to load, and save the images that aren’t 404. The application Ben envisioned, would simply download every result and ignore the ones that were 404 or took too long to load. Once you have all the results on your hard drive it is a lot easier to sift through them.

I quickly banged out a release that was workable in a few days and never really looked at it again.

This week I had an exact need for this application again. When I fired it up, it seems Google changed their site so it wasn’t working anymore. I spent some time tonight to fix it up and got it working much better than the original.

So here it is. v1.0 (i guess) release of Picture Pump (named after the site Ben ran at the time – HoneyPump):

Click launch to start/install the application. OSX users should be all set, windows users you might have to get java first.

also, hey look. source code

Email me with any suggestions or patches.

UPDATE: 10/24/2009
– fixed my mime settings so that launch button works again (sorry moved servers and forgot to set it)
– fixed a few bugs
– added support for safe search
– added filtering by image type and license
– source code is now under the Apache Public 2.0