Simulating sampling in PL/SQL

Jan Jansen

The standard PL/SQL package dbms_random only allows for drawing uniformly distributed (all numbers equally likely) random numbers. As far as I know, other discrete probability distributions have to be simulated. This is easily accomplished using a little elementary probability theory.

The following code allows for drawing a sample of a size to be specified at runtime from a discrete probability distribution on an arbitrary (though unfortunately necessarily finite) set of positive integers. It is just meant for illustrating a standard math trick and is neither robust nor optimized (nor wellformed :-(. It’s just a quick hack. Perhaps I will look into this.

The sample space used and its discrete probability distribution are presented as the table prob_tab (num number, weight number), where the first column represents the set numbers to be drawn from and the second the probability of each element presented as a weighing factor. Hence the law is also relatively arbitrary. The sample drawn is inserted into a table sample_tab (num number,freq number), where freq counts the occurences of num. I assume both tables to be present at runtime.

PROCEDURE draw_sample (p_size IN NUMBER) AS
    -- define datatypes to store the probability distribution
  type  prob_rec is record (weight number, distrib number);
  type  prob_tab is table of prob_rec index by pls_integer;
  prec        prob_rec;
  prob        prob_tab;
    total_weight  number;
  -- set max.precision for the uniform distribution on 0<=x<=1
  -- dbms_random.random produces -2^31 <= integer <= 2^31
  uni_prec      constant number := power(2,31);
  uni           number;
  -- load probability data 
  cursor c_prob is 
    select * from prob_dist order by num asc;
  cursor c_sample (p_num number) is 
    select 'x' from sample_tab where num=p_num;
  i             number;
  cumul_weight  number;
  delete sample_tab;
  select sum(weight) into total_weight from prob_dist;
  for r_prob in c_prob loop
    -- initialize sample_tab table 
    insert into sample_tab values (r_prob.num,0);
    -- we need the prob.distribution function
    select sum(weight) into cumul_weight
      from prob_dist
        where num <= r_prob.num;
    prob(r_prob.num) := prec;
    end loop;
    -- draw the sample
  for t in 1..p_size loop
    -- draw a unif.distributed random number
    uni := mod(abs(dbms_random.random),uni_prec)/uni_prec;
    -- find the corresp.subinterval of [0,1]
    i := prob.first; 
      exit when uni <= prec.distrib;
      i :=;
      exit when (i is null);
    end loop;
    -- store the sample
    if (i is not null) then
      update sample_tab set freq = freq + 1 where num = i;
    end if;
  end loop;

The trick consists of partitioning the unit interval into subintervals of length corresponding to the weight of the numbers in the sample space and determining in which subinterval a dbms_random gererated uniformly distributed number falls. The index of the subinterval yields the number drawn.

Put a little more mathematically, what is essentially does is apply the generalized or Penrose inverse of the desired probabilty distribution function to a (continuous) uniformly distributed (on 0<=x<=1) random variable. This yields a random variable with the correct distribution.

The code can of course be generalized allow for sampling from a set of strings but my point should be clear.

Next Post

Back to the future (Oracle 4.1 VM appliance)

We had some nice KC-DBA (expertise meetings) meetings in 2005. One of them was about virtual machines (VMware basically). To show how you could use virtual machine technology to implement an environment which would be almost undoable nowadays to realize, a virtual machine environment was demonstrated with Oracle 4.1 installed based on MSDOS 6.22. Thanks to Carel Jan Engel (one the infamous oaktable guys) - i was able to build a default Oracle 4.1 database environment with all the goodies like the UFI (the user friendly interface).