# Simulating sampling in PL/SQL

The standard PL/SQL package dbms_random only allows for drawing uniformly distributed (all numbers equally likely) random numbers. As far as I know, other discrete probability distributions have to be simulated. This is easily accomplished using a little elementary probability theory.

The following code allows for drawing a sample of a size to be specified at runtime from a discrete probability distribution on an arbitrary (though unfortunately necessarily finite) set of positive integers. It is just meant for illustrating a standard math trick and is neither robust nor optimized (nor wellformed :-(. It’s just a quick hack. Perhaps I will look into this.

The sample space used and its discrete probability distribution are presented as the table prob_tab (num number, weight number), where the first column represents the set numbers to be drawn from and the second the probability of each element presented as a weighing factor. Hence the law is also relatively arbitrary. The sample drawn is inserted into a table sample_tab (num number,freq number), where freq counts the occurences of num. I assume both tables to be present at runtime.

```CREATE OR REPLACE
PROCEDURE draw_sample (p_size IN NUMBER) AS
-- define datatypes to store the probability distribution
type  prob_rec is record (weight number, distrib number);
type  prob_tab is table of prob_rec index by pls_integer;

prec        prob_rec;
prob        prob_tab;
total_weight  number;
-- set max.precision for the uniform distribution on 0<=x<=1
-- dbms_random.random produces -2^31 <= integer <= 2^31
uni_prec      constant number := power(2,31);
uni           number;

cursor c_prob is
select * from prob_dist order by num asc;
cursor c_sample (p_num number) is
select 'x' from sample_tab where num=p_num;

i             number;
cumul_weight  number;

BEGIN
delete sample_tab;
select sum(weight) into total_weight from prob_dist;
for r_prob in c_prob loop
-- initialize sample_tab table
insert into sample_tab values (r_prob.num,0);

-- we need the prob.distribution function
select sum(weight) into cumul_weight
from prob_dist
where num <= r_prob.num;

prec.weight:=r_prob.weight;
prec.distrib:=cumul_weight/total_weight;
prob(r_prob.num) := prec;
end loop;
-- draw the sample
for t in 1..p_size loop
-- draw a unif.distributed random number
uni := mod(abs(dbms_random.random),uni_prec)/uni_prec;

-- find the corresp.subinterval of [0,1]
i := prob.first;
loop
prec:=prob(i);
exit when uni <= prec.distrib;
i := prob.next(i);
exit when (i is null);
end loop;

-- store the sample
if (i is not null) then
update sample_tab set freq = freq + 1 where num = i;
end if;
end loop;
commit;
END;```

The trick consists of partitioning the unit interval into subintervals of length corresponding to the weight of the numbers in the sample space and determining in which subinterval a dbms_random gererated uniformly distributed number falls. The index of the subinterval yields the number drawn.

Put a little more mathematically, what is essentially does is apply the generalized or Penrose inverse of the desired probabilty distribution function to a (continuous) uniformly distributed (on 0<=x<=1) random variable. This yields a random variable with the correct distribution.

The code can of course be generalized eg.to allow for sampling from a set of strings but my point should be clear.