In the very first formulation of this service, all PDFs were scraped up by to a local folder via the 'wget' program on OS X. Those PDFs were all converted to HTML with 'pdftohtml' (OS X) so that they could be indexed.
Since then, a sync program has been written in python that processes any new documents available on the original site and converts them to be added to the search index.
Xapian is the underlying search engine framework, and Omega is the layer above it that provides indexing control via 'omindex'.
Cheers, ray
Ray Haleblian
twitter: rhaleblian
email: ray ET haleblian DOOT com