# Simulating sampling in PL/SQL

Jan Jansen

The standard PL/SQL package dbms_random only allows for drawing uniformly distributed (all numbers equally likely) random numbers. As far as I know, other discrete probability distributions have to be simulated. This is easily accomplished using a little elementary probability theory.

The following code allows for drawing a sample of a size to be specified at runtime from a discrete probability distribution on an arbitrary (though unfortunately necessarily finite) set of positive integers. It is just meant for illustrating a standard math trick and is neither robust nor optimized (nor wellformed :-(. It’s just a quick hack. Perhaps I will look into this.

The sample space used and its discrete probability distribution are presented as the table prob_tab (num number, weight number), where the first column represents the set numbers to be drawn from and the second the probability of each element presented as a weighing factor. Hence the law is also relatively arbitrary. The sample drawn is inserted into a table sample_tab (num number,freq number), where freq counts the occurences of num. I assume both tables to be present at runtime.

```CREATE OR REPLACE
PROCEDURE draw_sample (p_size IN NUMBER) AS
-- define datatypes to store the probability distribution
type  prob_rec is record (weight number, distrib number);
type  prob_tab is table of prob_rec index by pls_integer;

prec        prob_rec;
prob        prob_tab;
total_weight  number;
-- set max.precision for the uniform distribution on 0<=x<=1
-- dbms_random.random produces -2^31 <= integer <= 2^31
uni_prec      constant number := power(2,31);
uni           number;

-- load probability data
cursor c_prob is
select * from prob_dist order by num asc;
cursor c_sample (p_num number) is
select 'x' from sample_tab where num=p_num;

i             number;
cumul_weight  number;

BEGIN
delete sample_tab;
select sum(weight) into total_weight from prob_dist;
for r_prob in c_prob loop
-- initialize sample_tab table
insert into sample_tab values (r_prob.num,0);

-- we need the prob.distribution function
select sum(weight) into cumul_weight
from prob_dist
where num <= r_prob.num;

prec.weight:=r_prob.weight;
prec.distrib:=cumul_weight/total_weight;
prob(r_prob.num) := prec;
end loop;
-- draw the sample
for t in 1..p_size loop
-- draw a unif.distributed random number
uni := mod(abs(dbms_random.random),uni_prec)/uni_prec;

-- find the corresp.subinterval of [0,1]
i := prob.first;
loop
prec:=prob(i);
exit when uni <= prec.distrib;
i := prob.next(i);
exit when (i is null);
end loop;

-- store the sample
if (i is not null) then
update sample_tab set freq = freq + 1 where num = i;
end if;
end loop;
commit;
END;```

The trick consists of partitioning the unit interval into subintervals of length corresponding to the weight of the numbers in the sample space and determining in which subinterval a dbms_random gererated uniformly distributed number falls. The index of the subinterval yields the number drawn.

Put a little more mathematically, what is essentially does is apply the generalized or Penrose inverse of the desired probabilty distribution function to a (continuous) uniformly distributed (on 0<=x<=1) random variable. This yields a random variable with the correct distribution.

The code can of course be generalized eg.to allow for sampling from a set of strings but my point should be clear.

## Back to the future (Oracle 4.1 VM appliance)

We had some nice KC-DBA (expertise meetings) meetings in 2005. One of them was about virtual machines (VMware basically). To show how you could use virtual machine technology to implement an environment which would be almost undoable nowadays to realize, a virtual machine environment was demonstrated with Oracle 4.1 installed based on MSDOS 6.22. Thanks to Carel Jan Engel (one the infamous oaktable guys) - i was able to build a default Oracle 4.1 database environment with all the goodies like the UFI (the user friendly interface).