November | 2015 | Bitbach's Blog

This is about catching all the files stuff that is not contained with the export provided by wordpress for basic wordpress.com accounts (see https://en.support.wordpress.com/com-vs-org for the differences between the com and org plans of wordpress and in particular https://en.support.wordpress.com/export for the export options at wordpress.com).

Essentially, hitting export from wp admin -> tools -> export creates an xml file in a so called “WordPress eXtended RSS or WXR” format that may and shall contain any of your content and will in turn also comprise links to any of your files. Since (well formed) xml is truly machine readable, we may therefore extract those (https) links for backing up all the files.

There are a couple of options how to execute the link extraction and file grabbing. Me, I just use linux shell utilities for the ease of use in a small and simple call. However, while file grabbing is of course a dedicated wget job, link extraction can be done with grep or xmllint, whatever you prefer in terms of availability and effort. The difference is, basically, that grep will only succeed as long as the links, including the tags, do not span more than one line, because grep is line oriented, like so:

<wp:attachment_url>https://bitbach.files.wordpress.com/2009/01/snag-0112.jpg</wp:attachment_url>

xmllint, on the other hand, will always catch the text node successfully, no matter how many newlines surround the tags. Anyway, since this is not an issue currently, grep ma ybe safely used and will be much faster for large image and attachment collections.

Having the wrx file at hand, we may proceed using a script as follows:

grep style file grabbing

#!/bin/bash
BKP_FILE=$1
BKP_DIR=/home/.../Wordpress/`date +"%Y-%m-%d"`
if [ -f $BKP_FILE ]
  then
    mkdir -p $BKP_DIR
    cp $BKP_FILE $BKP_DIR
    cd $BKP_DIR
    cat $BKP_FILE | grep -oP '(?<=wp:attachment_url>)[^<]+' | wget -xi -
  else
    echo "File not found"
fi

xmllint style file grabbing

# just exchange the grep line like so, a one-liner, may be wrapped here
# takes one hack to read the namespaced tag and another to have lined output
xmllint --xpath "//*[local-name()='attachment_url']/text()" <(sed 's/<\/wp:attachment_url>/\n<\/wp:attachment_url>/g' $BKP_FILE | wget -xi -

… did not know that I already run so many files over at wordpress.com.

Enjoy, Peter

	Balaji N on RFS: No standby redo logfiles…
	Error 1031 received… on PING[ARC1]: Heartbeat failed t…
	windows7bugs on Locating kernel headers for vm…
	bitbach on Some irritation due to extende…
	codeinfig on Some irritation due to extende…

Bitbach's Blog

Just another Oracle weblog @ WordPress

Month: November 2015

How to backup your wordpress.com images and attachments