The process of converting my old WordPress posts to Webby was relatively painless, but there are a few things worth sharing.
The first step was to export my WordPress MySQL database and create a local copy, and then to create DataMapper classes corresponding to the two tables I was interested in, wp_posts and wp_comments.
mysql> describe wp_posts; +-----------------------+---------------------+ | Field | Type | +-----------------------+---------------------+ | ID | bigint(20) unsigned | | post_author | bigint(20) | | post_date | datetime | | post_date_gmt | datetime | | post_content | longtext | | post_title | text | | post_category | int(4) | | post_excerpt | text | | post_status | varchar(20) | | comment_status | varchar(20) | | ping_status | varchar(20) | | post_password | varchar(20) | | post_name | varchar(200) | | to_ping | text | | pinged | text | | post_modified | datetime | | post_modified_gmt | datetime | | post_content_filtered | text | | post_parent | bigint(20) | | guid | varchar(255) | | menu_order | int(11) | | post_type | varchar(20) | | post_mime_type | varchar(100) | | comment_count | bigint(20) | +-----------------------+---------------------+ 24 rows in set (0.01 sec) mysql> describe wp_comments; +----------------------+---------------------+ | Field | Type | +----------------------+---------------------+ | comment_ID | bigint(20) unsigned | | comment_post_ID | int(11) | | comment_author | tinytext | | comment_author_email | varchar(100) | | comment_author_url | varchar(200) | | comment_author_IP | varchar(100) | | comment_date | datetime | | comment_date_gmt | datetime | | comment_content | text | | comment_karma | int(11) | | comment_approved | varchar(20) | | comment_agent | varchar(255) | | comment_type | varchar(20) | | comment_parent | bigint(20) | | user_id | bigint(20) | +----------------------+---------------------+ 15 rows in set (0.00 sec)
And no, I don’t know why they have wp_posts.ID as a bigint(20) and then wp_comments.comment_post_ID, which should be the same size, as an int(11). This is a database that has been upgraded a few times so perhaps that’s a legacy thing.
While DataMapper can easily accept a non-standard primary key in a table, it gets a little trickier when you are linking two tables together using has n and belongs_to. I found it simpler to just change the names of the primary keys and foreign key. So, after creating a new database and loading the mysqldump file with all my blog’s data, I ran the following:
1 2 3 | ALTER TABLE wp_posts CHANGE ID id bigint(20) unsigned;
ALTER TABLE wp_comments CHANGE comment_ID id bigint(20) unsigned;
ALTER TABLE wp_comments CHANGE comment_post_ID post_id int(11);
|
Update: I think I cracked the custom parent_key, child_key bit in DataMapper.
22 23 24 25 26 27 28 29 30 31 32 | class Post
has n, :comments,
:parent_key => [:ID],
:child_key => [:comment_ID]
end
class Comment
belongs_to :post,
:parent_key => [:comment_post_ID],
:child_key => [:comment_ID]
end
|
parent_key_example.rb for a full working example. This should negate the need to change field names as above but I haven’t fully tested it.
One of the really nice things about DataMapper is that it will happily ignore any fields in your database which you don’t mention explicitly. So, you only have to define DataMapper properties for the fields you want to be able to work with. The top of my post.rb file looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 | class Post
include DataMapper::Resource
storage_names[:default] = 'wp_posts'
property :id, Integer, :serial => true # original field name ID
property :post_date, DateTime
property :post_content, Text
property :post_title, String
property :post_status, String
property :post_name, String
has n, :comments, :comment_approved => true, :order => [:comment_date]
|
And my comment.rb file starts with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | class Comment
include DataMapper::Resource
storage_names[:default] = 'wp_comments'
property :id, Integer, :serial => true # original field name comment_ID
property :post_id, Integer # original field name comment_post_ID
property :comment_author, String
property :comment_author_url, String
property :comment_date, DateTime
property :comment_content, String
property :comment_approved, Boolean
property :user_id, Integer
belongs_to :post
|
So, just like that I can access all my posts and comments using DataMapper classes, and I can do things like post.comments.
The initialization for DataMapper is simply:
1 2 3 4 5 6 7 | require "rubygems"
require "dm-core"
DataMapper.setup(:default, 'mysql://localhost/ananelson_wordpress?socket=/tmp/mysql.sock')
# Local files
require "lib/comment"
require "lib/post"
|
Now, how do I get the content formatted nicely? Wordpress takes the data stored in the database and feeds it through a PHP function called the_content.
1 2 3 4 5 6 7 | // This is an excerpt from the WordPress source code. http://wordpress.org/about/gpl/
function the_content($more_link_text = '(more...)', $stripteaser = 0, $more_file = '') {
$content = get_the_content($more_link_text, $stripteaser, $more_file);
$content = apply_filters('the_content', $content);
$content = str_replace(']]>', ']]>', $content);
echo $content;
}
|
The apply_filters function is the thing that interests me. More digging in the WordPress source revealed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | // This is an excerpt from the WordPress source code. http://wordpress.org/about/gpl/
add_filter('the_content', 'wptexturize');
add_filter('the_content', 'convert_smilies');
add_filter('the_content', 'convert_chars');
add_filter('the_content', 'wpautop');
add_filter('the_content', 'prepend_attachment');
# snip...
add_filter('comment_text', 'wptexturize');
add_filter('comment_text', 'convert_chars');
add_filter('comment_text', 'make_clickable', 9);
add_filter('comment_text', 'force_balance_tags', 25);
add_filter('comment_text', 'convert_smilies', 20);
add_filter('comment_text', 'wpautop', 30);
|
So, WordPress has a number of filters which are applied to the post content and the comments after the text is pulled out of the database. The simplest way I could think of to replicate this behaviour was to just use these same WordPress filters. I decided that I could live without the convert_smilies, and that there was no reason not to use make_clickable for my posts as well as for the comments, so that left me with a standard list of filters. I wrote a short php-based shell script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #!/usr/bin/env php -q
<?php
include 'wp/plugin.php';
include 'wp/kses.php';
include 'wp/formatting.php';
include 'wp/shortcodes.php';
$text = file_get_contents($argv[1]);
$text = wptexturize($text);
$text = convert_chars($text);
$text = make_clickable($text);
$text = force_balance_tags($text);
$text = wpautop($text);
echo $text;
?>
|
Then I just had to wrap the shell script in Ruby.
10 11 12 13 14 15 16 17 18 19 20 | def wp_format(text)
tmpfile = "temp.txt"
File.open(tmpfile, 'w') do |f|
f.write text
end
result = `./wp_format #{tmpfile}`
`rm #{tmpfile}`
puts result
result
end
|
For some reason Ruby’s Tempfile library gave me some strange filenames which either got garbled or weren’t palatable to system(), so I just used “temp.txt”. You could always add a timestamp if you wanted to.
Now, I need to recreate the perma-url scheme I had set up in WordPress.
14 15 16 17 18 19 20 21 22 23 |
def filedir
location = "../content/" # relative path to webby content dir
location + "said/on/" + post_date.strftime("%Y/%m/%d/") + post_name
end
def filename
filedir + "/index.txt"
end
|
I used a directory “said/on” (yeah, sorry, I was feeling too clever that day) followed by Year/Month/Day and then the post slug. So, in my Post class I have two functions, filedir which creates the directory and then filename which adds the post slug and a .txt extension (.txt since this is going into Webby).
Finally, I need code which formats comments and posts, and then a method to iterate over all published posts and all approved comments to print them in that format.
Inpost.rb:
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
def webby_header
%{---
title: #{post_title}
created_at: #{post_date.to_s}
---
}
end
def publish
FileUtils.mkdir_p(filedir)
File.open(filename, "w") do |f|
f.write(webby_header)
if [33].include?(id) # Post no. 33 and wp_format don't get along.
f.write(post_content)
else
f.write(wp_format(post_content))
end
if !comments.empty?
f.write("\n\n<hr>\n\n<h3>Comments</h3>\n")
comments.each do |c|
f.write(c.to_html)
end
end
end
end
def self.publish_all
FileUtils.rm_rf("../content/said")
Post.all(:post_status => 'publish').each do |p|
p.publish
end
end
|
comment.rb:
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
def author_with_url
if comment_author_url.to_s === ""
comment_author
else
%{<a href="#{comment_author_url}">#{comment_author}</a>}
end
end
def to_html
%{
<b>#{author_with_url}</b> #{comment_date.strftime("%d %b %Y")}
#{wp_format(comment_content)}
}
end
|
Not the most beautiful of code, but I’m only using it once and it works.
So, when I call Post.publish_all, I get a directory structure like this in my Webby content directory:

And the next time I call rake build, each of those text files will be converted to a HTML page.
I have ignored tags and categories, and I didn’t have to deal with images in any of my blog posts, so that made this job easier. I did have to manually tweak the output for two of these blog posts. In one of them, quotation marks were turned into some bizarre character and, since there were only 6 of them, I changed them by hand. Also one of my posts resisted wp_format completely so I just excluded that one from being formatted and added a Webby textile filter, which worked just fine.
If I had more posts to convert I would have investigated the reasons behind these problems and adjusted my code accordingly, but in this case it made sense to just fix them.
So, there you are. A relatively painless export. I can see that DataMapper is going to be my tool of choice for quickly working with legacy databases and exporting or reformatting them. It’s so quick to set up, and then you have access to any Ruby library you need to help you process your data.
You are free to make use of any of these scripts subject to the terms of the GPL. We really, really need a decent license for code snippets which fits in a single line comment. I’m going with GPL on this one since that is WordPress’s license and I’m using bits of their code here. But, if you want to do something similar to what I have done here not relating to WordPress then you can consider the code I have written to be in the public domain or, if you prefer, the MIT license. And, thats the code, not the blog post. Of course, if you find this useful I’d love to hear about it in the comments, by email or on your blog.
If you are looking for any of my old posts, there is a list of them here.