IBM Quest Market-Basket Synthetic Data Generator
The code (C++) can be downloaded: www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData. However, it will not compile under
g++ on Solaris. Some small changes are needed. I've included below links
to the corrected files. After downloading all the files simply
type "make" at the command-line. A executable file "gen" will be created.
"gen lit help" will explain the input parameters (the IBM Quest website above
has additional information).
I offer no gaurantees that the files below are free of additional bugs. If
you discover any, please let me know. I am keeping a Bug List.
There are several different output formats the generator will produce. My favorite involves
the following. At the command-line enter: executable_filename lit -ascii -ntrans XX -tlen
YY -nitems ZZ > TXXLYYNZZ.data. XX x 1000 transactions will be produced involving YY
average number of items per transaction, drawn from ZZ x 1000 total number of items.
The output will be written to file TXXLYYNZZ.data. Each line of the file is a transaction.
The items in each transaction are represented by item numbers and are separated by spaces.
Some additional files will be generated. These can be ignored. To get command-line
help, executable_filename lit help.