Genres on the web (book)
Book plan (Sunday 19 April: Marina)
List of Contributors
- Riding the Rough Waves of Genre on the Web
by Marina, Alexander, Serge
PART I - Identifying the Sources of Web Genres
- Conventions and Mutual Expectations - Understanding Sources for Web Genres
by J. Karlgren
- Genre Connectivity and Genre Drift in a Web of Genres
by L. Björneborn
- Identification of Web Gernes by User Warrant
by M. Rosso and S. Haas
- Problems in the Use-Centered Development of a Taxonomy of Web Genres
by K. Crowston, B. Kwasnik and J. Rubleske
PART II - Automatic Web Genre Identification
Chapter 5 (corpus linguistics, vector space, individual web page)
- In the Garden and in the Jungle: Comparing Genre in the BNC and Internet
by S. Sharoff
Chapter 6 (computational linguistics, vector space, individual web page)
- Cross-Testing a Genre Model for the Web
by M. Santini
Chapter 7 (digital library, vector space, individual web page)
- Formulating Representative Features with Respect to Document Genre Classification
by Y. Kim and S. Ross
Chapter 8 (vector space, web sites)
- Classification of Web Site at Super-Genre Level
by C. Lindemann and L. Littig
Chapter 9 (structure, website)
- A Structure-Oriented Classifier of Web Genre
by M. Dehmer and F. Emmert-Streib
Chapter 10 (IR model)
- Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues
by B. Stein, S. Meyer zu Eissen, and N. Lipka
Chapter 11 (IR, ranking )
- Marrying Relevance and Genre Ranking: An Exploratory Study
by P. Braslavski
PART III - Empirical Web Genre Analyses
Chapter 12 (emerging genre identified by a quantitative graph model)
- A Quantitative Graph Model of Social Ontologies
by A. Mehler
- Genre Emergence in Amateur Flash (emerging genre identified by social network analysis)
by J. Paolillo, J. Warren and B. Kunz
- Variations among Blogs: A Multi-Dimensional Analysis (blog typology identified by Multi-Dimensional Analysis)
by J. Grieve, D. Biber, E. Friginal, and T. Nekrasova
- Evolving Genres in Online Domains: The Hybrid Genre of the Participatory News Article (emerging genre identified by qualitative analysis)
by I. Bruce
- Any Land In Sight?
by Marina, Alexander, Serge
Index of Names
- Lennart Björneborn
Genre Connectivity and Genre Drift in a Web of Genres
The chapter outlines an exploratory empirical investigation of genre connectivity in an academic web space, i.e., how web page genres are connected by links. The data set contained source and target pages on shortest link paths between different topical domains at UK universities. The pages were categorized into 9 institutional and 8 personal genre classes (bundled genre categories). Most frequent genre pairs were institutional link lists linking to institutional homepages and personal link lists linking to personal publications. Some genres function as ‘hook’ genres being outlink-prone (e.g. link lists) and some as inlink-prone ‘lug’ genres (e.g. institutional homepages). A genre network graph is used to discuss web spaces as webs of genres with genre drift and topic drift, i.e., changes in page genres and page topics along link paths. Complementarities of genre drift and topic drift may affect small-world properties in the shape of short link distances between different topical clusters in academic web spaces.
- Pavel Braslavski
Marrying Relevance and Genre Rankings: an Exploratory Study
In this chapter, we discuss different options for using genre-related information in Web search. We conduct an experiment on merging genrerelated and text-relevance rankings using a reference Web collection. A method for automatic extraction of formality score akin to readability score using canonical discriminant analysis applied to a sample of genres with decreasing formality is proposed. Effects of aggregating genre-related and text relevance rankings are considered. Evaluation of the results shows moderate positive effects. The findings suggest further research on implicit use of genre-related information in Web search in the directions of static ranking, query analysis and personalized search.
- Ian Bruce
Evolving genres in online domains: The hybrid genre of the participatory news article
Cognitive science proposes that any category, such as a genre as a category for a certain type of text, is formed in relation to human purpose or intentionality (see Barsalou, 1983; Murphy & Medin, 1985). Grouped in relation to three types of high level, general purpose for (academic) writing, Young (2006) posits three broad categories of genre: those of personal discourse (such as diaries, journals, notebooks); interactive discourse (letters, emails, fora in publications and other written messages) and public discourse (articles, reports, presentations). However, an outcome of internet-based communication and publication has often been to conflate these general types of writing purpose, resulting in the hybridising of what were previously discrete genres. An example of this conflation of writing purposes leading to the development of a hybrid genre is that of a news article immediately followed by readers‘ comments - sometimes termed participatory journalism. In this web genre, public discourse (such as the publication of a news article sourced from a press agency) is combined with interactive discourse of the blog that follows the article, typically including readers‘ reactive comment about both the content of the article and also about the views of other readers. In this chapter, this particular hybrid genre will be analysed in terms of dual approach to genre proposed by the writer (Bruce, 2008a). The chapter first reviews approaches to the notion of genre as a method of categorisation of written texts, leading to the presentation of a rationale for the dual approach of social genre and cognitive genre as being necessary to account for the range of different types of knowledge that combine to identify texts as belonging to a particular genre. An explanation of the knowledge frameworks of the social genre /cognitive genre model is then followed by analysis of a sample of ten sample texts of the participatory journalism genre. The sample is rater analysed in terms of the social genre aspects of context, epistemology, writer stance/audience addressivity and content staging. The texts are also examined for their use of cognitive genres, involving types of general rhetorical purpose, their related discourse patterns and relations between propositions. The genre modelling and research reported in the chapter is an argument for the notion that an adequate operationalisation of a genre as a category of written texts, including a web genre, should be able to account for the socially constructed, cognitive organisational and linguistic elements of genre knowledge.
- Kevin Crowston, Barbara Kwasnik and Joseph Rubleske
Problems in the Use-Centered Development of a Taxonomy of Web Genres
A document's genre reflects the purpose of a document and as such is potentially useful meta-data to improve search effectiveness. Using genre in an information retrieval system seems to require a taxonomy of genres to provide a controlled vocabulary and to show relations among genres. In this paper, we report on a study to develop a `bottom-up' genre taxonomy, that is, from the genre terms identified by informants. We collected a total of 767 genre terms from 52 respondents (teachers, journalists and engineers) engaged in natural use of the Web, and reduced this list to a set of 298 genres. We report on various difficulties we encountered in the study. Respondents frequently had difficulty coming up with an unambiguous genre label for a page, offering several possibilities, or applied the same label to many pages. In many cases, respondents could not think of a term, or applied an overly general term, such as an ``information page. Furthermore, even when respondents did offer a clear genre term, they often were unable to say what about the page led to that choice. These difficulties seem to reflect underlying problems in the definition of genres as social constructions, that have meaning only in use.
- Matthias Dehmer and Frank-Emmert-Streib
Mining Graph Patterns in Web-based Systems: A Conceptual View
This chapter discusses a graph-based perspective for automatically analyzing web genre data by mining graph patterns representing web-based hypertext structures. The major purpose of our contribution is to emphasize that an approach entirely di�erent to the vector space model, frequently used in Web mining and related problems, can not only be applied to these problems but is more suitable conceptually. The graphs in our study are hierarchical and directed and are called generalized trees. Starting from a similarity measure for determining the structural similarity of generalized trees, we discuss some evaluation steps for automatically analyzing web genre data. Finally, connections for the application in Web Structure Mining and Web Usage Mining are indicated.
- Jack Grieve, Douglas Biber1, Eric Friginal, and Tatiana Nekrasova
Variation Among Blogs: A Multi-dimensional Analysis
This chapter uses multi-dimensional analysis to investigate functional linguistic variation in internet blogs, with the goal of identifying text types that are distinguished linguistically. A 2 million word corpus of blogs written in American English, sampled across a wide range of topics, is analyzed for this purpose. The corpus is tagged for grammatical information and a factor analysis is carried out to identify the major linguistic patterns of co-occurrence across this corpus. The resultant factors are interpreted as underlying dimensions of functional linguistic variation. The dimensions are subsequently used as predictors in a cluster analysis, which identifies the text types that are linguistically well-defined in this domain of use. These texts types are interpreted functionally by reference to the typical thematic domains and communicative purposes of the blogs grouped into each type. Two main sub-types of blogs are identified: personal blogs and thematic blogs.
- Jussi Karlgren
Conventions and Mutual Expectations understanding sources for web genres
Genres can be understood in many different ways. They are often perceived as a primarily sociological construction, or, alternatively, as a stylostatistically observable objective characteristic of texts. The latter view is more common in the research field of information and language technology. These two views can be quite compatible and can inform each other; this present investigation discusses knowledge sources for studying genre variation and change by observing reader and author behaviour rather than performing analyses on the information objects themselves.
- Yunhyong Kim and Seamus Ross
Formulating representative features with respect to document genre classification
Genre classification (e.g. whether a document is a scientific article or magazine article) is closely bound to the physical and conceptual structure of documents as well as the level of depth involved in the text. Hence, it provides a means of ranking documents retrieved by search tools according to metrics other than topical similarity. Moreover, the structural information derived from genre classification can be used to locate target information within the text. In previous studies, the detection of genre classes has been attempted by using some normalised frequency of terms or combinations of terms in the document (here, we are using term as a reference to words, phrases, syntactic units, sentences and paragraphs, as well as other patterns derived from deeper linguistic or semantic analysis). These approaches largely neglect how the term is distributed throughout the document. Here, we report the results of automated experiments based on distributive statistics of words in order to present evidence that term distribution pattern is a better indicator of genre class than term frequency.
- Christoph Lindemann and Lars Littig
Classification of Web Sites at Super-genre Level
We present an approach for the classification of Web sites at supergenre level. This approach utilizes both structure and content of Web sites in order to distinguish between eight relevant Web genres. We show that this combination of structural and content-based features considerably improves the classification performance compared to approaches solely based on structure or content. We evaluate our approach on a dataset comprising more than 16,000 Web sites with about 20 million crawled and 100 million known pages. The approach achieves an accuracy of 92% for the classification of these Web sites.
- Alexander Mehler
A Quantitative Graph Model of Social Ontologies
- John C. Paolillo, Jonathan Warren and Breanne Kunz
Genre Emergence in Amateur Flash
Research on genre emergence in digital media often characterizes the emergence of new genres using notions of “community” and “social interaction”. In this chapter, we attempt to provide empirical content to these notions by employing a social network approach. We examine Flash animations posted to Newgrounds.com, in terms of both genre features and favorite author nominations. Results indicate that participants’ social network positions are strongly associated with the genres of Flash they produce. We argue from these findings that the social network positions of Flash authors contribute to the establishment of genre norms, and that a social network approach can be crucial to understanding genre emergence.
- Mark A. Rosso and Stephanie W. Haas
Identification of Web Genres by User Warrant
The use of genre metadata has been proposed as a potentially beneficial supplement to general web search engines. A key issue in this solution is the selection of genre labels and definitions for web pages. What genres should be used in a general search engine? How are these genres to be identified? What are effective methodologies for collecting user terminology for the purpose of deriving web page genre labels? Three criteria for effective labels are proposed. In light of these criteria, traditional genre theory is applied to the web. The existing research literature is examined, focusing on the results of a series of studies in which the feedback of almost 300 users was solicited for the purpose of building a classification of genre labels for web pages from the .edu Internet domain. The chapter includes discussion of the implications of our findings for future studies of web genre, including recommendations for best practice.
- Marina Santini
Cross-Testing a Genre Classification Model for the Web
An Automatic Genre Identification (AGI) model is presented and cross-tested with a number of genre collections. In this difficult experimental setting, the AGI model shows some robustness and stability and its results are in line with current genre-enabled applications. The model provides some insights into open issues in AGI on the web. In particular, it shows that the diverse definitions of the concept of genre might have a strong bearing on the characterization of genre classes, thus affecting the generability of AGI models as a whole.
- Serge Sharoff
In the garden and in the jungle: comparing genres in the BNC and Internet
In this chapter I will present an approach to classifying the Web into genres. The goal is to have a compact system of categories that can be assigned with little ambiguity to almost every webpage. The proposed typology is organised from the functional viewpoint: generalised categories for genre classification correspond to major aims of text production, such as 'discussion' or 'instruction'. This chapter compares the genre distributions in English and Russian automatically constructed Internet corpora against their human-collected counterparts (BNC and RNC) in terms of these classes using probabilistic classifiers.
- Benno Stein and Sven Meyer zu Eissen and Nedim Lipka
Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues
People who search the World Wide Web often have a multi-faceted understanding of their information need: they know what they are searching for, and they know of which form or type the desired documents should be. The former aspect relates to the content of a desired document (= topic), the latter to the presentation of its content and the intended target group. Due to the different user groups and the technical means of the World Wide Web several favorite specializations of Web documents emerged: a document may contain many links (e. g. a link collection), scientific text (e. g. a research article), almost no text but pictures (e. g. an advertisement page), or a short answer to a specific question (e. g. a mail in a help forum). These examples suggest that it can be of much help if the retrieval process is capable to address a user’s information need regarding to—what is called here—“genre” or “Web genre”. This chapter contributes toWeb genre analysis. It presents relevant use cases, discusses existing and new technology for the construction of Web genre retrieval models, and outlines implementation aspects for a genre-enabled Web search. Special focus is put on the generalization capability of Web genre retrieval models, for which we present new evaluation measures and, for the first time, a quantitative analysis.
- Lennart Björneborn
Genre Connectivity and Genre Drift in a Web of Genres
The chapter outlines an exploratory empirical investigation of genre connectivity in an academic web space, i.e., how web page genres are connected by links. The data set contained source and target pages on shortest link paths between different topical domains at UK universities. The pages were categorized into 9 institutional and 8 personal genre classes (bundled genre categories). Most frequent genre pairs were institutional link lists linking to institutional homepages and personal link lists linking to personal publications. Some genres function as ‘hook’ genres being outlink-prone (e.g. link lists) and == Table of Contents == (Pre-final: 5 Marzo 2009)
Genres on the web: Computational Models and Empirical Studies
Preface by James Martin