Gender: Gender and Features Essay

Submitted By kimniouck
Words: 5709
Pages: 23

DIGITAL HUMANITIES 2006 conventions is exhibited: a character nears death and expires in a room usually full of flowers and mourners who often “swoon.” The training set for this experiment includes two other texts that were scored on the same sentimentality scale, Susanna Rowson’s 1794 novel Charlotte: A Tale of Truth and Harriet Jacobs’s Incidents in the Life of a Slave Girl. Since these texts were considered sentimental, most chapters were scored in the medium or high range, so the categories were changed to “highly sentimental” and “not highly sentimental.” With D2K, the Naive Bayes method was used to extract features from these texts, which we might call markers of sentimentality. Looking at the top 100 of these features, some interesting patterns have emerged, including the privileging of proper names of minor characters in chapters that ranked as highly sentimental. Also interesting are blocks of markers that appear equally prevalent, or equally sentimental, we might say: numbers 70-74 are “wet,” lamentations,” “cheerfulness,” “slave-trade,” and “author.” The line of critical argument that goes that the sentimental works focus on motherhood is borne out by “mother” at number 16 and “father” not in the top 100. As we move into the next three phases of the project, we will include stemming as an area of interest in classifying the results. Phase two will use two more novels by the same authors as those in the training set; phrase three may include ephemera, broadsides, and other materials collected in the EAF collection at the UVa Etext Center. Phase four will run the software on texts considered nonsentimental in the nineteenth century and other phases might include twentieth and twenty-first century novels that are or are not considered sentimental. We hope to discover markers that can identify elements of the sentimental in any text.

Performing Gender: Automatic Stylistic Analysis of Shakespeare’s Characters

Department of Computer Science, Illinois Institute of Technology

Department of Computer Science, Bar-Ilan University

1. Introduction


recent development in the study of language and gender is the use of automated text classification methods to examine how men and women might use language differently. Such work on classifying texts by gender has achieved accuracy rates of 70-80% for texts of different types (e-mail, novels, non-fiction articles), indicating that noticeable differences exist (de Vel et al. 2002; Argamon et al. 2003). More to the point, though, is the fact that the distinguishing language features that emerge from these studies are consistent, both with each other, as well as with other studies on language and gender. De Vel et al. (2002) point out that men prefer ‘report talk’, which signifies more independence and proactivity, while women tend to prefer ‘rapport talk’ which means agreeing, understanding and supporting attitudes in situations. Work on more formal texts from the British National Corpus (Argamon et al. 03) similarly shows that the male indicators are mainly noun specifiers (determiners, numbers, adjectives, prepositions, and post-modifiers) indicating an ‘informational style’, while female indicators are a variety of features indicating an ‘involved’ style (explicit negation, first- and second-person pronouns, present tense verbs, and the prepositions “for” and “with”). Our goal is to extend this research for analyzing the relation of language use and gender for literary characters. To the best of our knowledge, there has been little work on understanding how novelists and playwrights

P. 82 Single Sessions

DH.indb 82

6/06/06 10:55:29

DIGITAL HUMANITIES 2006 portray (if they do) differential language use by literary characters of different genders. To apply automated analysis techniques, we need a clean separation of the speech of different characters in a literary work. In novels, such speech is integrated