This is about catching all the files stuff that is not contained with the export provided by wordpress for basic wordpress.com accounts (see https://en.support.wordpress.com/com-vs-org for the differences between the com and org plans of wordpress and in particular https://en.support.wordpress.com/export for the export options at wordpress.com).
Essentially, hitting export from wp admin -> tools -> export creates an xml file in a so called “WordPress eXtended RSS or WXR” format that may and shall contain any of your content and will in turn also comprise links to any of your files. Since (well formed) xml is truly machine readable, we may therefore extract those (https) links for backing up all the files.
There are a couple of options how to execute the link extraction and file grabbing. Me, I just use linux shell utilities for the ease of use in a small and simple call. However, while file grabbing is of course a dedicated
wget job, link extraction can be done with
xmllint, whatever you prefer in terms of availability and effort. The difference is, basically, that
grep will only succeed as long as the links, including the tags, do not span more than one line, because
grep is line oriented, like so:
xmllint, on the other hand, will always catch the text node successfully, no matter how many newlines surround the tags. Anyway, since this is not an issue currently,
grep ma ybe safely used and will be much faster for large image and attachment collections.
Having the wrx file at hand, we may proceed using a script as follows:
grepstyle file grabbing
#!/bin/bash BKP_FILE=$1 BKP_DIR=/home/.../Wordpress/`date +"%Y-%m-%d"` if [ -f $BKP_FILE ] then mkdir -p $BKP_DIR cp $BKP_FILE $BKP_DIR cd $BKP_DIR cat $BKP_FILE | grep -oP '(?<=wp:attachment_url>)[^<]+' | wget -xi - else echo "File not found" fi
xmllintstyle file grabbing
# just exchange the grep line like so, a one-liner, may be wrapped here # takes one hack to read the namespaced tag and another to have lined output xmllint --xpath "//*[local-name()='attachment_url']/text()" <(sed 's/<\/wp:attachment_url>/\n<\/wp:attachment_url>/g' $BKP_FILE | wget -xi -
… did not know that I already run so many files over at wordpress.com.