import/export

How to backup your wordpress.com images and attachments

This is about catching all the files stuff that is not contained with the export provided by wordpress for basic wordpress.com accounts (see https://en.support.wordpress.com/com-vs-org for the differences between the com and org plans of wordpress and in particular https://en.support.wordpress.com/export for the export options at wordpress.com).

Essentially, hitting export from wp admin -> tools -> export creates an xml file in a so called “WordPress eXtended RSS or WXR” format that may and shall contain any of your content and will in turn also comprise links to any of your files. Since (well formed) xml is truly machine readable, we may therefore extract those (https) links for backing up all the files.

There are a couple of options how to execute the link extraction and file grabbing. Me, I just use linux shell utilities for the ease of use in a small and simple call. However, while file grabbing is of course a dedicated wget job, link extraction can be done with grep or xmllint, whatever you prefer in terms of availability and effort. The difference is, basically, that grep will only succeed as long as the links, including the tags, do not span more than one line, because grep is line oriented, like so:

<wp:attachment_url>https://bitbach.files.wordpress.com/2009/01/snag-0112.jpg</wp:attachment_url>

xmllint, on the other hand, will always catch the text node successfully, no matter how many newlines surround the tags. Anyway, since this is not an issue currently, grep ma ybe safely used and will be much faster for large image and attachment collections.

Having the wrx file at hand, we may proceed using a script as follows:

  • grep style file grabbing
    #!/bin/bash
    BKP_FILE=$1
    BKP_DIR=/home/.../Wordpress/`date +"%Y-%m-%d"`
    if [ -f $BKP_FILE ]
      then
        mkdir -p $BKP_DIR
        cp $BKP_FILE $BKP_DIR
        cd $BKP_DIR
        cat $BKP_FILE | grep -oP '(?<=wp:attachment_url>)[^<]+' | wget -xi -
      else
        echo "File not found"
    fi
    
  • xmllint style file grabbing
    # just exchange the grep line like so, a one-liner, may be wrapped here
    # takes one hack to read the namespaced tag and another to have lined output
    xmllint --xpath "//*[local-name()='attachment_url']/text()" <(sed 's/<\/wp:attachment_url>/\n<\/wp:attachment_url>/g' $BKP_FILE | wget -xi -
    

… did not know that I already run so many files over at wordpress.com.

Enjoy, Peter

Oracle character set conversion downgrade checkup with utl_i18n

Doing an export/import or a ctas or an oci/jdbc client action into a database that has smaller sized character set, multi- to single-byte for example, will raise the problem of information loss in terms of character data. The information loss, however, is not limited to loosing one or the other character. Some characters may also become replaced by a default or a best guess replacement character in the target character set automatically.
Profound analysis of the outcome of the character set conversion might also cause difficulties when only the source database is already available and just an estimation of the information loss is what you need. You may come up with good old convert() to execute an input/output compare but aside from using convert() is discouraged with the latest releases of Oracle, convert() will also fracture the strings on input as soon as the first multibyte character appears. convert() will alert you of some problem but it will not tell you why or even better: what character is (first) the stumbling block.

(more…)

Recoursively counting files by extension in a windows shell script

It took me sort of lightyears to get this tiny piece of code running. Does not seem much logic to incorporate, uuh? Some looping and counting and printing. However, scripting this old windows shell eventually turned out a nightmare .. Ok, I learned a lot on the way to this script, about delayed expansion in nested loops (blocks) or assignments of arithmetic expressions. But the persisting lesson learned is: next time install python or something comparable first (or go and learn power shell).

Well, at the end of the day i again and again noticed how much I profit from the knowledge available on the net. So I’m going to share this script back for everyone having an interest.

Remarks:

  • get me a count of all files of type (or extension) %1, recoursing any dir within the current dir
  • the pushd / popd stuff is used because %%A cannot be used following “for /r” 😉
  • the !CNT! is used because the changes in the inner loop are not visible to the outer one 😉 😉
echo off
setlocal enabledelayedexpansion
for /f %%A in ('dir /b/a:d') do (
  pushd %%A
  set CNT=0
  for /r %%X in (*.%1) do ( set /a CNT+=1 )
  echo %%A !CNT!
  popd
)

Enjoy!

Resource allocation when importing a yago2 ontology into an oracle semantic database

This is a short review of the resources nessecary when importing a full yago2 ontology, which is about 200 mio triples, into an oracle semantic database. Some information and snippets about the way to execute the import is given but this is not the main focus of the article. I’m still on a fresh, not otherwise loaded 11gR2.0.1 database.

The way to execute the import mainly followed the instructions given with the rdf demos, that is using sqlldr and bulkload.ctl to populate a staging table as proposed in ORACLE_HOME\md\demo\network\rdf_demos and afterwards employ sem_apis.bulk_load_from_staging_table() to actually load the data into the sematic net. bulkload.ctl has in fact not being changed anyway, the yago2 data being supplied in nt triples formatting like this:

<Embeth_Davidtz> <http://yago-knowledge.org/resource/actedIn> <Army_of_Darkness> .

and the staging table:

create table yago2_stage (
  RDF$STC_sub varchar2(4000) not null,
  RDF$STC_pred varchar2(4000) not null,
  RDF$STC_obj varchar2(4000) not null,
  RDF$STC_sub_ext varchar2(64),
  RDF$STC_pred_ext varchar2(64),
  RDF$STC_obj_ext varchar2(64),
  RDF$STC_canon_ext varchar2(64)
) compress;

and the sql loader call:

sqlldr userid=lucene/*** control=yago2.ctl data=yago2_1.nt direct=true skip=0 load=95000000 discardmax=10000 bad=yago2.bad discard=yago2.dis log=yago2.log errors=0

(more…)

Reading an xml file as an oracle external table

There is a couple of posts around that imply or claim an oracle external table to be able to read xml files by design. This is not true, really. It may work for you or actually may fit your current xml file structure but you should generally not approach to read xml by external tables. Here is why.

What external tables or sql loader for better can do actually is reading line oriented, file based data in bulk and quite fast. It offers a lot of settings to parametrize the loading process, the reader, according to your input format. You may for example have a look at the excellent series of posts discussing this options by Jiri starting here Oracle External Tables by Examples part 1 – TAB delimited fields.

Well, this functionality, may also be (mis)used to read xml instances from files. Given a file as the following one may be interested in extracting the contents of the token tags.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<FILE_INFO>
	<PERSON_BILD>
		<token_row><token>Fritz &amp; Fischer</token></token_row>
		<token_row><token>Boris Borsberg</token></token_row>
	</PERSON_BILD>
</FILE_INFO>

Some external table definition may then look like this:

(more…)