sample

Install the Oracle OE- and SH-schemata for the OrderEntry- and SalesHistory-benchmarks

# - oe takes intermedia, sh takes partitioning and bitmap-indexes, so ee is a must
# - https://oracle-base.com/articles/misc/install-sample-schemas

# [ oracle@my.host.de.de $ ~ ]
mkdir sample-schemas ; cd sample-schemas
wget https://github.com/oracle/db-sample-schemas/archive/v12.2.0.1.zip
unzip v12.2.0.1.zip
cp -R db-sample-schemas-12.2.0.1/* $ORACLE_HOME/demo/schema
cd $ORACLE_HOME/demo/schema
perl -p -i.bak -e 's#__SUB__CWD__#'$(pwd)'#g' *.sql */*.sql */*.dat

# yet install
rlsqlplus system/oracle@my_db_srv
show con_name -- MY_DB_SRV
-- system_password sys_password hr_password oe_password pm_password ix_password sh_password
--   bi_password tablespace_name temp_tablespace_name log_location ez_connect_string
@mksample oracle oracle hr oe pm ix sh bi users temp /u01/app/oracle/product/12.2.0/dbhome_1/demo/schema/log/ my_db_srv
^d

# see the oe/sh log files
less $ORACLE_HOME/demo/schema/log/oe_oc_v3.log
less $ORACLE_HOME/demo/schema/log/sh_v3.log

Getting a raw constant number of rows from oracle’s table sample function

You may of course know these two famous posts called To sample or not to sample… (part-2) about data sampling by Mark Hornick. Although very limited in scope, the two posts (imho) very well sketch why we may employ data sampling and how we may lift off table sampling in oracle.
In general, sampling is used to make a representative statement about a collection of data while only regarding a limited random selection, the sample. As long as you are ok to analyze just a sufficient subset of your 1o million rows table for an analysis, you will save your environment a lot of resources and time. On some other scenario, a limited random data selection may also serve verification or testing purposes where, however, not the representativeness but the randomness at a more or less constant sample size, determines the quality of the sample output. Again, as long as you are ok to not exceed this 15 minutes time window overnight, you will be allowed to run that live unit test on any table in question, on 1, 10 or 100 million rows.
In sql, selecting in regard to gain a representative statement will feed the sample function with a requested percentage of rows to sample from. This is what the oracle sample function already offers. Yet another sql to accept a requested actual number of rows to return, independent of the table size, is not available so far (although most people do expect exactly this behaviour when they spot the sql sample function for the first time, weird). The following text will outline a snippet of pl/sql to provide for a sample function to accept the expected number of rows as a parameter.

(more…)