Genre Classes
From WebGenreWiki
This list of categories is meant as a basis for discussions, not as a final list. Please feel free to add new categories (sorted in alphabetical order) or comments - but don't delete them without discussing this first. If you add a new category, it might be useful to add a short description as well to avoid misunderstanding (maybe as a new wiki-page). For a start, all genres listed in the LREC draft are included.
Genre Classes
A group of people trying to compile a list of genres needs a very solid definition of a genre, especially since members of the group have shown signs of disagreement over what a genre is already. I will elaborate on the existing definition, but help from people better versed in literary theory is needed.
A genre category or class is a set of documents with the same style/technique, form/format, content and/or function/purpose (typing define:genre into Google gives a number of definitions agreeing on the first three items). Since style, technique, form and format are all pretty similar things, let us say that a genre is defined by form, content and purpose. Purpose is in the mind of the author, so it needs to be inferred from form and content and is not likely to be directly useful for automatic identification of genres. However, it is quite useful for human understanding. Content aspect is not particularly desirable from the information-retrieval perspective, since content or topic is typically determined by keywords, but I suspect we will not be able to define genres completely orthogonal to topics. Should we try, though?
Genres can be narrower or broader: they range from genres like "news" to supergenres like "journalism", forming hierarchies based on a subcategory IS A supercategory relation, like "news IS journalism". Narrow genres are more likely to be tied to particular content: even though "news" can be just about anything, "weather report" cannot. I think if we allow content to be a part of some genres, this should be kept at the lowest level of the hierarchy.
Andrea: I absolutely disagree that content is part of the definition of genre. Even though it sometimes happens, that some genres only (or never) contain texts of a certain topic, it doesn't mean this topic is characteristic for one genre. Also, if you have a look at our current list of genres, none except children's and pornographic are related to topic. So I would use the definition proposed or used by Dewe, Karlgren and Cutting, Finn and Kushmerick, Roussinov, Crowston and Kwasnik as well as Meyer zu Eissen/Stein and somehow Boese and Howe (I guess you all know their work): Documents that share a specific form (= style + layout) and function/purpose. So we should try.
Mitja: On one hand, let me quote a review of our RANLP paper: A good rule of thumb is that, if you cannot decide if something is a genre or not, it should be possible to instantiate a genre (in other words, to write a text that belongs to a certain genre). Using this not really scientific rule of thumb, it becomes clear that "weather report", "love letter", "academic paper", "shopping list" and "FAQ document" are indeed genres (because you can write instances of these genres), but "adult", "childrens'", "community", "content delivery", "gateway", "informative", "official", "journalistic", "personal", "scientific" and several others you mention are not. "Weather report" and "love letter" are definitely topical and one can easily think of more genres like that - for example "CV", "obituary", "job application" ... On the other hand, the genres below are mostly not topical and somebody claims that "weather report" is not a genre. If we included in our corpus every concrete genre such as those listed by the reviewer (which are often topical), we would end up with an unmanageble number of them. So we will have to prepare our corpus at a more abstract level and each of the corpus genres will contain several concrete subgenres. I suggest that we do not alow the corpus (abstract) genres to be topical, but that we simply admit that many of the concrete subgenres are topical. We could ignore those subgenres, but I think they will be helpful in corpus preparation because they are much easier to identify than the more abstract ones.
Mikael: I think it is better to avoid such terms as 'content'. From my point of view, form expresses content, but content is not restricted to topic (or theme, or subject matter). A text is a complex expression, and the dichotomy of Saussure implies a serious simplification of the act of text production. A love letter can certainly be considered topical (topic = the feeling of love belonging to the author) but a love letter may equally well be a label of a genre --- a piece of text written in praise of the loved one, intended to be read by this loved one only. The content of a love letter, is it not rather what it tries to accomplish rather than what it is about? The praise (if it is written in that way), rather than the author's love. At least both. Clearly, this example illustrates that the genre 'love letter' may differ from time to time. Is it not still a love letter if the author only expresses his agony over his love of the beloved, in stead of praising her?
Marina: (30 Nov. 2007) Gosh! this is really an entangled maze, and we are trapped in it! Where's the way out? We can start by saying that:
- we all agree on the fact that genres of written documents (paper, electronic or web documents) are generated by a recurrent action, or practice, performed by a group of people, or community. This community can be of any size, small or large, so the use (or visibility) of a genre can be smaller or larger. An interview on a newspaper is a widely recognized genre, while, probably, and e-zine is a niche genre.
- we all agree that we can identify some kind of hierarchical or whole-part relationship in genre categories. If we take "Journalism" as a super-genre (as opposed to, say, fiction), we have several categories (editorial, feature article, column, letter to the editor, etc.) There are many other distinctions that could be made with genre classes, such as subgenre, but also see the concept of "conglomerate" and "bound" and "free" genres.
We could go on forever with the discussion of what a genre class is. But for the time being I would like to suggest a practical hint:
- as we want to incorporate genre classes in a search engine, we should first to have a look at query logs. Of course this is not trivial, because query logs are precious stuff, and nobody will give them to us spontaneously. So take it as wishful thinking.
What are the genre classes thar are interesting for users? I have the impression that users' studies have some limitations (to be expanded). If we could analyse the query logs of a major search engine, we could immediately work out a possible genre palette (sorry, Georg, but I like this term better than "set of genres"), and then decide the level of hierarchy we need. Also, if we had classes like editorials or wheather reports, we could decide whether it is worth adding them in a future genre palette, because apparently just specifying "editorials", "blogs", "faqs" and "weather reports" as keywords in Google is enough. There is no need of a specialized search engine to retrieve them. It might be different for other genres, like "research report" or "bulletin news".
I think it is also important to identify the level of genre granularity or homogeneity that a genre class should have for retrieval purposes. Genre classes are going to be handled by mathematical/statistiacal algorithms, which are not very sensitive to users' needs. These algorithms have other, more formal, requirements. Is it more effective and efficient to search for broader classes rather than very specific classes, or viceversa? To what extent is it possible to distinguish among similar, but separate, genres, for instance between "online tutorial" and "how-to", or "editorial" and "sermon"? How accurate is the retrieval of these classes with a regular search engine? Can we improve this accuracy with genre features?
So if the aim is practical, i.e. we want to increase the level of information access, we should first focus on what is not covered by current search engines. For this purpose, i.e. for a retreival purpose, we need a TREC-like benchmark , with queries and relevant documents.
For the study of web genres in themselves, as human artifacts, we need something different. To start with, it would be handy to have a list of the genres that have been adapted and/or created for the web, together with a set of (potentially automatically extractable) features than can help us discriminate among them.
Of course, practical aims and theoretical studies overlap, because in order to retrieve genres on the web we need to know what are the best features for automatic classification.
Well, my overall hint is that we should start disantagling the different issues. If we do not try to unravel the different tracks in genre research, we will keep on being very confused. [this is my contribution of the day :-) ]
Mirko (3 Dec 2007): I’m not so sure that we can extract a strong collection of genres from logs. We will find, probably, a floksonomy, not a true taxonomy. It is useful instead to take a look at the genre-aware Google services, including those in development (which are presumably built after accurate log compulsation...):
Web search
Group search
News search (with subsections, e.g. Google UK News search includes: World, UK, Business, Sci/Tech, Sport, Entertainment,
Health, Most Popular)
Blog search
Scholar
Code Search
Glossary (= find definitions)
Apart from that, I suspect that users actually will consistently use for their searches only three of the items already assigned to our list : “reviews”, “recipes” and “weather report”. Should we focus upon them?
Marina (5 Dec 2007): I can see that Blog is a (super)genre, and that a glossary can be a genre. But the rest of the Google categories are very thematic. They are useful categories indeed, but they look mor like domains.
List of genres to be discussed
Marina: Since for me the concept of genre is very much related to:
* recurrent linguistic conventions
* recurrent layout conventions
* recurrent discourse organization
* recurrent distribution of discourse functions (eg. an editorial is often argumentative)
* a genre name (I will try to say what a "genre name" is tomorrow)
I would save:
- (supergenres): blogs, drama, forum, homepage, index, poetry, prose, fiction [...];
- (genre): commentary, advertisements, discussion, error message, FAQs, feature story, chat, interview, reportage, recipe,eshop, schedule [...];
- I see topic interacting with genre at subgenre level (e.g. scientific reports) or at website level (e.g. academic personal home page). However, it is not as simple as that: I can't decide if "weather report" is a subgenre or a genre on itself. What does it share with the other kinds of "reports"? Uhmm... I think it is true what I read recently, somewhere, i.e. genres can be characterized at different levels. There are genres that are characterized more at layout level, for example for a shopping list (see Swales) or a hotlist the layout is very important; some other genres are more visual, for examples catalogues or product lists; some other are more content-based (it might be the case of "weather report", or something like the "thriller" genre); some others are characterized by the discourse functions, eg. an story is often narrative. All these levels can mix, eg. a sermon is argumentative/persuasive and is often about relegious themes...
Mikael: I think this list is much reflected by both common namings of document categories and reflections of hitherto unlabelled categories. To be really useful and subsequently furnished by labels, such a list has to be compiled by applying descriptions that consistently adhere to the same aspect of a text. For example, a 'PhD thesis' serves the purpose of gaining a PhD degree, a download serves the purpose of giving access to some resource(s) that is not intended for reading. But I have difficulties to determine e.g. what an 'article' should be in this respect. This is a category of the MeyerZuEissen corpus, but it contains contributions to e-journals, glossaries, project summaries, annotated link lists and more material that does not fit well with any other category in this list. I believe, when people say that they want an article on this or that topic, they are mostly referring to journal contributions. But if the definition of 'articles' is that it is part of a journal, it has less to do about genre.
Want to share with you: Naming is a very difficult issue. Several om my fellow colleagues (teachers) refer to articles with the implicit denotation of "contributions to scientific journals", stundents, however, tend to interpret this word as referring to newspaper contributions only. in addition, some students show evidence of that they do not understand the word 'thesis' (avhandling in Swedish) and tend to regard scientific articles as 'theses'. If genre classification is for users, what do we do about this?
List of Genres
