The standard PL/SQL package dbms_random only allows for drawing uniformly distributed (all numbers equally likely) random numbers. As far as I know, other discrete probability distributions have to be simulated. This is easily accomplished using a little elementary probability theory.
The following code allows for drawing a sample of a size to be specified at runtime from a discrete probability distribution on an arbitrary (though unfortunately necessarily finite) set of positive integers. It is just meant for illustrating a standard math trick and is neither robust nor optimized (nor wellformed :-(. It’s just a quick hack. Perhaps I will look into this.
The sample space used and its discrete probability distribution are presented as the table prob_tab (num number, weight number), where the first column represents the set numbers to be drawn from and the second the probability of each element presented as a weighing factor. Hence the law is also relatively arbitrary. The sample drawn is inserted into a table sample_tab (num number,freq number), where freq counts the occurences of num. I assume both tables to be present at runtime.
CREATE OR REPLACE PROCEDURE draw_sample (p_size IN NUMBER) AS -- define datatypes to store the probability distribution type prob_rec is record (weight number, distrib number); type prob_tab is table of prob_rec index by pls_integer; prec prob_rec; prob prob_tab; total_weight number; -- set max.precision for the uniform distribution on 0<=x<=1 -- dbms_random.random produces -2^31 <= integer <= 2^31 uni_prec constant number := power(2,31); uni number; -- load probability data cursor c_prob is select * from prob_dist order by num asc; cursor c_sample (p_num number) is select 'x' from sample_tab where num=p_num; i number; cumul_weight number; BEGIN delete sample_tab; select sum(weight) into total_weight from prob_dist; for r_prob in c_prob loop -- initialize sample_tab table insert into sample_tab values (r_prob.num,0); -- we need the prob.distribution function select sum(weight) into cumul_weight from prob_dist where num <= r_prob.num; prec.weight:=r_prob.weight; prec.distrib:=cumul_weight/total_weight; prob(r_prob.num) := prec; end loop; -- draw the sample for t in 1..p_size loop -- draw a unif.distributed random number uni := mod(abs(dbms_random.random),uni_prec)/uni_prec; -- find the corresp.subinterval of [0,1] i := prob.first; loop prec:=prob(i); exit when uni <= prec.distrib; i := prob.next(i); exit when (i is null); end loop; -- store the sample if (i is not null) then update sample_tab set freq = freq + 1 where num = i; end if; end loop; commit; END;
The trick consists of partitioning the unit interval into subintervals of length corresponding to the weight of the numbers in the sample space and determining in which subinterval a dbms_random gererated uniformly distributed number falls. The index of the subinterval yields the number drawn.
Put a little more mathematically, what is essentially does is apply the generalized or Penrose inverse of the desired probabilty distribution function to a (continuous) uniformly distributed (on 0<=x<=1) random variable. This yields a random variable with the correct distribution.
The code can of course be generalized eg.to allow for sampling from a set of strings but my point should be clear.