You may of course know these two famous posts called To sample or not to sample… (part-2) about data sampling by Mark Hornick. Although very limited in scope, the two posts (imho) very well sketch why we may employ data sampling and how we may lift off table sampling in oracle.
In general, sampling is used to make a representative statement about a collection of data while only regarding a limited random selection, the sample. As long as you are ok to analyze just a sufficient subset of your 1o million rows table for an analysis, you will save your environment a lot of resources and time. On some other scenario, a limited random data selection may also serve verification or testing purposes where, however, not the representativeness but the randomness at a more or less constant sample size, determines the quality of the sample output. Again, as long as you are ok to not exceed this 15 minutes time window overnight, you will be allowed to run that live unit test on any table in question, on 1, 10 or 100 million rows.
In sql, selecting in regard to gain a representative statement will feed the sample function with a requested percentage of rows to sample from. This is what the oracle sample function already offers. Yet another sql to accept a requested actual number of rows to return, independent of the table size, is not available so far (although most people do expect exactly this behaviour when they spot the sql sample function for the first time, weird). The following text will outline a snippet of pl/sql to provide for a sample function to accept the expected number of rows as a parameter.
There are times when people ask me to identify the root causes of excessive Oracle Database (and host) loads in the past, still being astonished how i do manage to get this information out of nothing. Well, really, there is no out of nothing, never. There may be a complex scenario and a couple of tools at hand which span another level of complexity, unfortunately. That’s all of it. You see, complexity is a bad thing, something you will want to avoid or eliminate at all times. Some people furthermore aim to prove their skills in talking about what degree of complexity they are able to handle… hhm, poor ones, I don’t envy them, always struggling to tighten a plethora of strings, over and over, I’m more a convention-over-configuration guy, but that’s another discussion.
Back to to the point, Oracle Database is a vast stack of technology, indeed inducing a fair level of complexity in analysis and so is Oracle Enterprise Manager, the number one monitoring tool for the database and further (so called) targets like hosts and jvms, in operation. In fact, Oracle Enterprise Manager, being based on data collection in the database (AWR – Automatic Workload Repository), data evaluation in the database (ADDM – Automatic Database Diagnostic Monitor) and data transfer to some production instance (Enterprise Manager Agent) constitutes another mountain top in the Oracle landscape not everyone is able or willing to climb. So iff you misplaced your hiking boots for today, the following is for you.
Ok then, log yourself in, using your EM credentials.
Oracle supports hierarchical querying since long and has continuously extended the set of available features, i.e. pseudo-columns, path generation, leaf sorting by name, etc. With 11gR2, Oracle even complements with the ANSI SQL standard approach: the CONNECT BY syntax for hierarchical querying that can now be replaced by the ANSI SQL standard Recursive Subquery Factoring clause, see Oracle RDBMS 11gR2 – goodbye Connect By or: the end of hierarchical querying as we know it for more information.
However, talking about hierarchical querying with Oracle always implies a top-down traversion (see another post on that subject from a couple of years ago that did not provided a PL/SQL ready solution as Implementing bottom-up path traversal for hierarchical tables). You start (START WITH) at some root node(s) of your choice and dive into the branches and leaves (CONNECT BY) by connecting child-nodes to parent-nodes until the final leaf has been consumed (check CONNECT_BY_ISLEAF) or a certain LEVEL has been reached by filter. There is tons of examples out there showing this in action. Here is just some simple code example to start off with, employing SYS_CONNECT_BY_PATH, introducing root- to leaf-node path generation over the child-to-parent relationship between columns T.P_ID and T.ID.
It took me a resort to good old autotrace lately when attempting to optimize a sparql query in oracle’s semantic technology stack. The point was actually, that the still growing graph of data rendered some queries, that were fast once, into lame old rusty cars (another good old story as well).
So, what sort of query do I talk about? Not much of a monster, being stripped down for simplicity just imagine a graph of newspaper articles that relate to some named entities by nodes representing a named entity recognition match and offering a hook to hang up additional match information, say the weight of the match and so on. The query features sort of an entry point (that it was’nt actually!), the uri of the article here. That entry point is just applied as a filter expression to this pattern nodes (?art_1 = article, ?sim_1 = sem item match, ?sitm = sem item). Executing the following simplified stuff originally took around 8 secs run time, cough, cough.