Undo retention, flashback query and ora-01555 snapshot too old


Setting up a fault aware database environment is in charge of regarding possible physical and logical error scenarios. On the physical side of the medal, you got real-application-cluster or dataguard at your disposal. The logical one usually comprises backup and replication. Recently, oracle and others introduced features that enables a database to recover from logical errors without or with less remote systems and data. Namely flashback is a powerful technology to look back into (query), back out (transaction) or fully restore (database) the history of data. Flashback of course takes accompanying (history) (meta)data to do its job, namely again, undo-before-images, archived-redo-logs and flashback-logs. So, essentially, the convenience of your course into history depends on the amount of history metadata that is available to the database at the time of a logical error, say an inadvertant delete from.

You may already have learned or heard about or even hardly experienced that flashback-logs may be purged from the recovery-area in favour of archived-redo-logs, rendering your configured flashback-retention-time (DB_FLASHBACK_RETENTION_TARGET) a value of theory. So far, sizing the file-system recovery-area is a task to be performed in a clear-sighted manner. The same is true then for undo-before-images (compare to logs) und the undo-tablespace (compare to area) to accomodate enough data to meet your configured undo-retention-time (UNDO_RETENTION) in UNDO_MANAGEMENT=AUTO mode, while in addition still support long running workload queries (not to overwrite old undo). This article discusses the usability of the (enterprise-manager) undo-advisor in sizing your undo-tablespace to always foster successful flashback queries up to the configured undo-retention-time (as opposed to the nasty surprise of getting an ora-01555, see below, which is also misleading here, imho, in suggesting that the undo-tablespace is to small now, as we know it from workload queries – nope, it has been to small before now, in the past, as regarded to flashback queries – also see : ora-01555 snapshot too old when running flashback query).

select count(*) from that_table as of timestamp
  to_timestamp('09.08.2017 17:30:00','dd.mm.yyyy hh24:mi:ss');

Be notified, however, that there at first follows some introduction to the enterprise-manager undo-advisor page contents before we set sails for the practice know how and recommendations. Feel free to skip over to the next bold faced headline.

enterprise-manager undo-advisor outline

Ok, this is where you go to enter the undo advisor pages.

Up here we have a general (analysis) and an activity subpage. On the general page, the current values for the undo-retention, the undo-tablespace size and the undo-tablespace entensibility are given. Following we have initial analysis results for an option #1 time range over the last seven days and regarding the longest workload query ever recorded within. The advisors recommendations comprise a minimum undo-tablespace size to meet the undo-retention setting and a proposed undo-tablespace size according to the past system load. In fact, since the current undo-tablespace size of the example is far below either the minimum and the proposed undo-tablespace size, a problem report has been included as well.

The general page also features a handy visual (chart) representation of the analysis that allows for a kind of a perspective view on the relationship between the undo-retention and the undo-tablespace size. One may determine whether the relationship is (mostly) linear or exponential, i.e. school math, you know. In addition, the lowest best-possible markup hint specifies the undo needed if only long running workload queries but no flashback queries is needed.

The initial analysis results on the general page (option #1 over seven days) are usually sufficient to figure out the magnitudes in question. Adapting the analysis time range to a lower value does not provide any sense, imho, and specifying a undo-retention value maually, which is option #2, is superfluous, since the relationship between undo-retention and undo-tablespace size has already been depicted in the chart discussed above, you just lookup the y = f(x), math again.

However, the parameter space spanned by the longest workload query seen and the longest flashback query planned, options #1 and #2 of the analysis, is though not yet evident. This is where the the second subpage, the activity page comes into play. According to the time range selected on the general page, real numbers are provided for the the longest workload query, the average and the maximum undo-rate as well as any occurences of ora-01555.

Another chart section visualizes the activity statistics, where chart #1 illustrates, as for the example, that (current) workload queries occupy only a very small share of the undo as compared to (potential) flashback queries. If you recently raised your undo-retention parameter or enlarged your undo-tablespace, the yellow area usally shows up as growing to accomodate history undo.

Chart #2 depicts what is interesting for (potential) flashback queries. That is, whatever you configured for undo-retention, according to the space available in the undo-tablespace, this is your very actual possible flashback time. Any attempt to flashback query beyond these minutes in time will result in an ora-01555. So this is our welcome to the real world and the proof that only setting undo-retention is just half the battle.

Last but not least, chart #3 helps to verify whether the are regardable variations in the undo-generation-rate, i.e. whether the available flashback history may become (very much) smaller during specific ranges in the analysis time range. As follows, the available flashback history develops in an inverse relationship to the undo-generation-rate (see the start of the time range compared to the preceding figure).

enterprise-manager undo-advisor practice and pitfalls

In practice and already in a recap, the enterprise-manager undo-advisor may be of considerable help but, however, only in terms of predicting the future of a more or less steady state system.

First of all, undo-advisor sources its analysis data from consecutive ADDM-runs, which in turn evaluate the current database layout and parameters. That is, for example, if you just enlarged your undo-tablespace and triggered another ADDM-run, undo-advisor will already show an increased possible flashback time (the graph almost leaping), which is impossible due to the new space initially being empty (even if a countable number blocks may become flipped from expired to unexpired).

Second to that, it remains obscure, what impact the autoextent-option of the undo-tablespace actually has on the configured undo-retention-time. Oracle states in About the Undo Retention Period that:

  • The UNDO_RETENTION parameter is ignored for a fixed size undo tablespace. The database always tunes the undo retention period for the best possible retention, based on system activity and undo tablespace size.

  • For an undo tablespace with the AUTOEXTEND option enabled, the database attempts to honor the minimum retention period specified by UNDO_RETENTION. When space is low, instead of overwriting unexpired undo information, the tablespace auto-extends. If the MAXSIZE clause is specified for an auto-extending undo tablespace, when the maximum size is reached, the database may begin to overwrite unexpired undo information.

Not true, literally. Me, I always had the autoextent-option on for every file in the undo-tablespace and oracle never (ever) did any space claim on its own to achieve the configured, not yet met undo-retention-time.

The following undo-advisor status snapshots over time may illustrate that somewhat dizzy behaviour. Starting from an undo-tablespace of four files size 8192m max 8192m, I did a couple of different extension changes to just see what happens. Please note that, although not shown for brevity, the undo-generation-rate was almost in a constant bandwidth during monitoring, no spikes or uplifts.

(2017-08-02) Adding <undotbs05.dbf size 8192m max 8192m> and <undotbs06.dbf size 512m max 8192m>, resizing <undotbs06.dbf size 8192m max 8192m> later that day, has the flashback time graph rocketing up, the unexpired blocks share also being raised.

The same is true for the next change (2017-08-03), adding yet another two files <undotbs07/8.dbf size 512m max 8192m> of only 512m initial and 8192m autoextend.


Three days later (2017-08-07) the flashback time graph went down considerably, the estimate being wrong by appr. 30% which may be expected behaviour.

However, the snapshot also proves the fact that oracle does not autoextend for undo flashback, although a total of 15360m allocatable space had been available.


The next change (2017-08-07) manually/explicitely resized <undotbs07.dbf size 8192m max 8192m> and <undotbs08.dbf size 8192m max 8192m> the latest two files up to the max file-size for the undo-tablespace.

While the yellow area of unexpired blocks for flashback did not move in any way, the actual flashback time gained a boost towards the configured undo-retention-time.

I personally cannot follow the air that the number of unexpired blocks remains unaltered by the change. It has been 48g before and after the change, although the total effective size of the undo-tablespace went up to 64g. There should be no relationship between Undo Retention Tuning and Alert Thresholds, right, because I’m running a non-fixed size undo-tablespace, not?

On the other hand, and of real relevance in practice, the increasing possible flashback time suggests, that the change has its sense to achive our goal.

lesson #1: Don’t fool around with non-fixed size undo-tablespaces, aka files autoextent being on. There is no effect, whatsoever.


So, obviously, from 2017-08-10 onwards, the rise of the possible flashback time stalled. We’re around 1.000 mins, targeting 1.440 mins, let’s have the next file add change (2017-08-11) with a total <undotbs09.dbf size 8192m max 8192m> and <undotbs10.dbf size 8192m max 8192m> of another 16g, preallocated.

Ok, the unexpired area extends. Ok, the minutes graph rises and, since the last 16g add change, delivered around 300 mins of additional flashback time, I expect to see my possible flashback time very near the 1.440 mins marker just a few days later… Well, yet the opposite commenced, at a continously comparable undo-generation-rate, i swear, see this screenshot.

lesson #2: Before finishing your undo-retention-time tuning, have at least two workload cycles (maybe working weeks, with batch runs at night and over the weekend) passed.


epilogue

Good old friend SQL still has v$undostat for those not enamoured by enterprise-manager, recording the possible flashback time in a 10 mins schedule. It’ll also give block status information for the undo-tablespace but I don’t trust this with respect to flashback, i.e. we saw above that there is more flashback time available at a constant unexpired block count over time, forget it.

Given that the undo-tablespace has not been changed considerably in the last couple of days, the computed tuned_undoret_ts is quite close to the threshold where ora-01555’s start to appear. I testet this on every other occasion on that database. In most cases, the flashback was even able to go 1 hour or so beyond the value in tuned_undoret_ts into history.

select to_char(begin_time, 'dd.mm.yyyy hh24:mi:ss') stat_begin_time,
 to_char(end_time, 'dd.mm.yyyy hh24:mi:ss') stat_end_time,
 (select round(value/60/60, 1) from v$parameter where name = 'undo_retention') cfgrd_undoret_h,
 round(tuned_undoretention/60/60, 1) tuned_undoret_h,
 to_char(end_time - (tuned_undoretention/60/60/24), 'dd.mm.yyyy hh24:mi:ss') tuned_undoret_ts,
 undoblks, txncount, activeblks, unexpiredblks, expiredblks,
 round(undoblks / ((end_time - begin_time)*24*60*60), 0) as undo_rate_s
from v$undostat
order by end_time desc
;

Have fun, Peter

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s