Definition
Definition
A group of people trying to compile a list of genres needs a very solid definition of a genre, especially since members of the group have shown signs of disagreement over what a genre is already. I will elaborate on the existing definition, but help from people better versed in literary theory is needed.
A genre category or class is a set of documents with the same style/technique, form/format, content and/or function/purpose (typing define:genre into Google gives a number of definitions agreeing on the first three items). Since style, technique, form and format are all pretty similar things, let us say that a genre is defined by form, content and purpose. Purpose is in the mind of the author, so it needs to be inferred from form and content and is not likely to be directly useful for automatic identification of genres. However, it is quite useful for human understanding. Content aspect is not particularly desirable from the information-retrieval perspective, since content or topic is typically determined by keywords, but I suspect we will not be able to define genres completely orthogonal to topics. Should we try, though?
Marina (30 Nov 2007): This year has been a frantic year of conferences and workshops, where I had the opportunity of talking to many different people from 3 communities: Corpus Linguistics, Computational Linguistics, Information Retrieval. I got different kinds of suggestions and advice.
The main suggestion from IR is: explain why genre classes are useful, and why we should bother about them in a search engine. What is an "editorial"? What does "genre" mean? [to be continued]
Robert Amsler (15 July 2007): To me the phrase 'genre' is incomplete. It should be 'genre of xxxx' where xxxx is some subset more specific than 'documents'. While you've set 'documents' as the top node, it might be helpful to see how 'genre' is used in other media, e.g., if I say 'genre of television shows' or 'genre of movies' these seem clearer. So perhaps 'documents', while abstract enough for everything, is in fact preventing 'genre' (also abstract) from settling down to a reasonable set of member categories. I'd try things like 'genre of correspondence' or 'genre of emails' or 'genre of books' etc. and flush out those lists. Work from the bottom up to see where those lists converge.