README 30 October 1992 by Paul Leyland (pcl@black.ox.ac.uk) This directory (/wordlists) contains a number of sub-directories each containing compressed wordlists by subject. The "Random" directory is a catch-all. Beware. There are rather a lot of words in total; there are even quite a few thousand duplicates. If anyone has wordlists of languages and topics not present here, please drop a line to pcl@black.ox.ac.uk, telling me how I can get hold of them. All the individual files in the Klein lists are here (not the aggregated all-words), as are many foreign language dictionaries and words from a number of technical and leisure fields Henk Smit, enk@cs.vu.nl, from whom I acquired many of these lists, writes: ======================================= These are the dictionaries I have found, their sizes, and where I got them from. Dutch: 178429 words, 1998881 bytes, 779056 bytes compressed. This list is made out of some smaller lists, het Groene Boekje (available at donau.et.tudelft.nl) TeX dutch wordlist (available at archive.cs.ruu.nl) local additions at de Vrije Universiteit (cs.vu.nl) German: There are two lists, germanl.Z and words.german.Z. germanl.Z: 27342 words, ? bytes, 137591 bytes compressed. words.german.Z: 160086 words, 2060734 bytes, 761528 compressed. both from ftp.informatik.tu-muenchen.de:/pub/doc/dict Italian: 60453 words, 561982 bytes, 217241 bytes compressed. David Vincenzetti ghost.unimi.it:/pub/voc.Z Norwegian: 61843 words, 589234 bytes, 258162 bytes compressed, Anders Ellefsrud , ftp.ifi.uio.no:/pub/dicts/norwegian-words.Z Swedish: 23688 words, 200853 bytes, 96169 bytes compressed. Finnish: 280475 words, 3340963 bytes, 1329070 bytes compressed. ftp.uu.net:/doc/dictionaries/Finnish Japanese: 115600 words, 935022 bytes, 403986 bytes compressed. ftp.waseda.ac.jp:/pub/security/wordlists names/Family-Names.Z and names/Given-Names.Z: Family-Names: 13484 names, 106780 bytes, 57749 bytes compressed. Given-Names: 8608 names, 60271 bytes, 31136 bytes compressed. Andrew Macpherson available on bnrgate.bnr.co.uk. names/names.french.Z and names/names.hp.Z: names.franch: 702 names, 5315 bytes, 3023 bytes compressed. names.hp: 44554 names, 430014 bytes, 188971 bytes compressed. Dan Kegel available on blacks.jpl.nasa.gov:/pub/security/wordlists names/surnames.finnish.Z 713 names, 4488 bytes, 2428 compressed. ftp.uu.net:/doc/dictionaries/Finnish ======================================= Here's the 0-Index file from another repository. All the following files are here somewhere. This lot was originally collected together by Don Olivier, don@hsph.harvard.edu, but have since diffused around the world to several ftp archives ======================================= Antworth @ Big dictionary, includes many inflected forms CIS @ Words and names from Current Index to Statistics (partial) CRL.words @ Dictionary from Center for Research in Lexicography Congress @ Names and nicknames of U. S. Congressmen Domains @ Internet domains Dosref @ Words from the DOS Technical Reference Manual Ethnologue @ Words from the "Ethnologue Database" Ftpsites @ Anonymous ftp sites Jargon @ Words from the Jargon File Koran @ Words from the Koran LCarrol @ Words from AliceIW, AliceTTLG, Snark Movies @ Characters, actors, and titles from thousands of movies Paradise.Lost @ Words from P. L. (a touch of class) Python @ Words and names from M. P. scripts Roget.words @ Words from 1911 R's Thesaurus Trek @ Words and names from Star Trek plot summaries Unabr.dict @ A big unabridged dictionary World.factbook @ Words, names, many acronyms from the CIA World Factbook Zipcodes @ All U. S. post offices (except the last half of Alaska) Words in /usr/dict/words deleted from all these lists Words in Dan Klein's suite of lists deleted from several of them (that's why "klingon" doesn't appear in "Trek") =======================================