WEBVTT

00:00:00.000 --> 00:00:10.840
Hi everyone, thank you for coming.

00:00:10.840 --> 00:00:16.080
So I will be talking about creative nonfiction and well, we know now how diverse digital

00:00:16.080 --> 00:00:17.800
humanities are, right?

00:00:17.800 --> 00:00:23.120
And so I will be approaching creative nonfiction specifically from a quantitative corpus linguistic

00:00:23.120 --> 00:00:28.560
perspective and thank you Gavin for the introduction into corpus linguistics for the audience today,

00:00:28.560 --> 00:00:30.200
so I don't have to do that.

00:00:30.200 --> 00:00:36.600
Okay, well when I thought about the theme for this training day, right, fictional text

00:00:36.600 --> 00:00:42.440
and real world context, this study immediately came to mind, I did it a while ago and I thought,

00:00:42.440 --> 00:00:45.480
well, that's exactly what creative nonfiction is.

00:00:45.480 --> 00:00:48.640
So that's why we're going to start with this question.

00:00:48.640 --> 00:00:50.720
Well what is this domain really?

00:00:50.720 --> 00:00:56.120
It's got this catchy name, creative nonfiction, it's got its own abbreviation, CNF.

00:00:56.120 --> 00:01:01.520
So when we look at how it's defined, we mainly get the writer's perspective.

00:01:01.520 --> 00:01:07.200
So they say it's a literary genre, but then they immediately add it's the fourth genre,

00:01:07.200 --> 00:01:12.240
so it doesn't fit into any of the existing ones, poetry, fiction, drama, so it's the

00:01:12.240 --> 00:01:15.560
fourth genre alongside those.

00:01:15.560 --> 00:01:21.240
The goal is to tell true stories and this word true is really important for them, right,

00:01:22.040 --> 00:01:27.040
So that's the nonfiction component and that's because the stories have to be factually accurate,

00:01:27.040 --> 00:01:34.240
but then the stories can be told through a literary lens, so with attention to literary

00:01:34.240 --> 00:01:39.320
technique and that's the creative part of the term, right, so that means that there's

00:01:39.320 --> 00:01:44.520
attention to character development and point of view and narrative arc, right, so these

00:01:44.520 --> 00:01:48.320
literary devices used in creative writing.

00:01:48.360 --> 00:01:53.120
And so writers talk about an immense amount of freedom and flexibility that this gives

00:01:53.120 --> 00:01:57.880
them, right, as they draw on their memories, their experiences, observations, opinions

00:01:57.880 --> 00:02:02.280
of the real world happenings, and so in this way the genre really invites them to push

00:02:02.280 --> 00:02:04.680
boundaries.

00:02:04.680 --> 00:02:10.560
And so this freedom basically resulted in creative nonfiction becoming an umbrella term

00:02:10.560 --> 00:02:16.400
for a wide wide range of different types of texts, right, so here we have memoirs, all

00:02:16.480 --> 00:02:23.480
kinds of essays such as personal lyrics, speculative, meditative, collage, braided, you name it,

00:02:23.560 --> 00:02:29.200
any kind of essay can happen under this creative nonfiction umbrella.

00:02:29.200 --> 00:02:34.640
And then, well, writers end up saying, well, creative nonfiction can really mean anything

00:02:34.640 --> 00:02:39.640
and everything, right, different things to different people, and that's exactly the characteristic

00:02:39.640 --> 00:02:44.120
that makes it so elusive but also so alluring.

00:02:44.840 --> 00:02:50.560
Okay, well, to summarize all that, well, some parts of this can be really informative but

00:02:50.560 --> 00:02:56.280
others not so much because we see that there's little agreement on what CNF actually is,

00:02:56.280 --> 00:03:02.080
right, because of how broad it is and because it's still evolving, and yet it's gained

00:03:02.080 --> 00:03:07.680
a lot of traction, at least in the American context where this study was done.

00:03:07.680 --> 00:03:13.080
We see major social issues being presented through the lens of creative nonfiction, acclaimed

00:03:13.080 --> 00:03:19.200
writers and literary journals apply the term to their published works, the classes in

00:03:19.200 --> 00:03:26.200
creative nonfiction, in creative writing graduate programs at universities, right, so the medium

00:03:26.720 --> 00:03:30.960
and the term have gained a lot of traction.

00:03:30.960 --> 00:03:36.000
But as a linguist, what you would notice is that all of these descriptions come from a

00:03:36.000 --> 00:03:40.760
rhetorical or literary perspective, right, so we're learning from the writers themselves

00:03:40.840 --> 00:03:46.600
what it is that they produce, but there is no linguistic account of this register or

00:03:46.600 --> 00:03:51.400
genre or text variety, whatever term you want to use, and a linguistic account would mean

00:03:51.400 --> 00:03:56.000
a focus on the specific linguistic features, specific linguistic characteristics of these

00:03:56.000 --> 00:04:02.400
texts, right, so how do, what are these texts actually comprised of, right, how do writers

00:04:02.400 --> 00:04:04.640
create these texts?

00:04:04.640 --> 00:04:10.000
And that was my motivation here, so what does this emerging register look like linguistically?

00:04:10.040 --> 00:04:15.320
What linguistic resources do the writers have at that disposal as they create these works?

00:04:15.320 --> 00:04:20.840
How do writers vary with respect to their use of these linguistic devices, and even why

00:04:20.840 --> 00:04:23.480
do they vary when they do?

00:04:23.480 --> 00:04:29.840
So I compiled a corpus of what I call modern creative nonfiction, that's because we have

00:04:29.840 --> 00:04:36.840
essays here of late 20th century, but then most of them of the 21st century, and so at

00:04:36.840 --> 00:04:40.320
the time I was actually doing some writing myself, I was taking a creative nonfiction

00:04:40.320 --> 00:04:46.240
workshop led by Lawrence Lenhart, one of my essays here, and so he was the one who directed

00:04:46.240 --> 00:04:52.360
me to these literary journals, digital literary journals, where I could find those essays

00:04:52.360 --> 00:04:59.360
and sample them, and so Essay Daily, the New York Review, right, digital platforms like

00:04:59.360 --> 00:05:06.360
these, and I compiled a corpus of 300 essays, it's not a huge corpus you can see, but it

00:05:06.360 --> 00:05:12.520
was enough for my purposes, and I chose to do multidimensional analysis, so I'll say

00:05:12.520 --> 00:05:19.800
a few words about this method that gives insights into linguistic variation in the domain.

00:05:19.800 --> 00:05:26.440
I won't get into its technicality a lot, but I will demonstrate how it works in a minute.

00:05:26.440 --> 00:05:31.800
So multidimensional analysis identifies these general underlying patterns in discourse,

00:05:31.800 --> 00:05:37.400
and we call them dimensions of variation in discourse. These dimensions are based on linguistic

00:05:37.400 --> 00:05:44.400
feature co-occurrence, so it's about how features occur together in text to construct a certain

00:05:44.920 --> 00:05:51.920
type of discourse, right, and so the way this is done is that your texts are tagged automatically

00:05:52.640 --> 00:05:57.200
with the help of a computer program called a lexical grammatical tagger, so these features

00:05:57.240 --> 00:06:04.240
are identified in your text, then their normed rates of occurrence are counted, are recorded,

00:06:05.120 --> 00:06:09.880
and so you have that information for each one of your texts, and then you usually start

00:06:09.880 --> 00:06:15.440
with a broad, broad range of linguistic features, right, but then you decide on the features

00:06:15.440 --> 00:06:21.400
that you want to focus on based on your domain, so linguistic features that you deem to be

00:06:21.400 --> 00:06:26.200
interesting, to be important or informative for the types of text that you work with,

00:06:26.200 --> 00:06:32.240
you select those and you include those into your analysis. And then some guidelines that

00:06:32.240 --> 00:06:39.240
come from statistics literature, right, this has to do with power for the statistical analysis

00:06:39.400 --> 00:06:45.000
here, five observations per variable, which for me meant that because I had 300 texts

00:06:45.000 --> 00:06:50.960
in the corpus, and I needed to have five texts per linguistic feature, so I could work with

00:06:51.120 --> 00:06:57.720
60 features that I think are important for these essays. Okay, well then you subject

00:06:57.720 --> 00:07:03.680
your texts and these feature counts to a procedure called factor analysis, and it's exactly this

00:07:03.680 --> 00:07:09.800
statistical procedure that's going to identify these underlying patterns of linguistic variation

00:07:09.800 --> 00:07:15.600
in your text, right, essentially it's going to look at a correlation matrix, so how these

00:07:15.800 --> 00:07:22.120
features correlate, right, so it's going to find texts where these features occur together

00:07:22.120 --> 00:07:27.040
and then texts where this set of features doesn't happen so much. Then you look at these

00:07:27.040 --> 00:07:32.440
patterns and you interpret these feature co-occurrence patterns functionally, so you ask the question,

00:07:32.440 --> 00:07:37.160
what do these features really do in my texts, right, what's the function that they perform?

00:07:37.160 --> 00:07:43.760
Okay, to make this a little bit more concrete, let's look at one of my dimensions. So a dimension

00:07:43.760 --> 00:07:49.600
is a cline from one set of features to another, we call them positive and negative, so the

00:07:49.600 --> 00:07:53.960
two sets of features are in complementary distribution, when the positive features happen

00:07:53.960 --> 00:07:58.560
a lot in text to construct a certain type of discourse, negative features are relatively

00:07:58.560 --> 00:08:03.320
infrequent, right, and vice versa, so when the negative features work together to construct

00:08:03.320 --> 00:08:10.320
some kind of discourse, then the positive features happen relatively rarely, okay. So

00:08:10.400 --> 00:08:14.120
let's take a minute to skim through those features, so in the positive end we have first

00:08:14.120 --> 00:08:18.400
and second person pronouns, then we have some indefinite and demonstrative pronouns, so

00:08:18.400 --> 00:08:24.400
these pronominal features on the one hand, we also have some verbs, right, that deletion,

00:08:24.400 --> 00:08:29.680
private verbs, public verbs, some verbal features, some adverbial features such as emphatics

00:08:29.680 --> 00:08:36.680
and hedges, right, so characteristics of the verbs. Okay, on the negative end now, we have

00:08:37.000 --> 00:08:41.720
some nominal features, attributive adjectives, prepositional phrases, right, prepositions

00:08:41.720 --> 00:08:48.560
are usually associated with nouns, nominalizations, these complex nouns, bypassives. Well, your

00:08:48.560 --> 00:08:53.400
next step is then looking at these features in context of course, right, because by themselves

00:08:53.400 --> 00:09:00.400
this is, well, they're only informative to a certain extent. So each text receives a

00:09:01.400 --> 00:09:08.400
dimension score, okay, so you are able to now plot each of your text along the dimension,

00:09:08.760 --> 00:09:15.120
right, that's the y-axis, and so the box plots here represent the spread of text scores for

00:09:15.120 --> 00:09:20.360
each of my authors, and the red square in the middle is the average of those text scores,

00:09:20.360 --> 00:09:27.040
the mean, alright. So a text that happens up here for example, and you can see that

00:09:27.040 --> 00:09:33.160
it has a high positive score, has a lot of positive features and those are bolded here,

00:09:33.160 --> 00:09:37.560
and the negative features are present, they're underlined, but compared to the positive features,

00:09:37.560 --> 00:09:41.600
well, they're not that frequent, right, and so when we look at a text like that and we

00:09:41.600 --> 00:09:46.320
see how these positive features work together, we understand what kind of discourse is created

00:09:46.320 --> 00:09:52.560
through that set of features. Then on the other extreme, this is a text with a low negative

00:09:53.560 --> 00:09:58.120
score, the negative features are underlined, and you see that barely any positive features

00:09:58.120 --> 00:10:04.640
are present. Okay, well now that we've looked at these features in context, we can offer

00:10:04.640 --> 00:10:11.480
or propose some interpretive label for the dimension, right, so for me this contrast

00:10:11.480 --> 00:10:17.320
here had to do with interactivity and involvement on the one hand, and informational focus or

00:10:17.320 --> 00:10:21.920
informational style on the other hand, because these nominal features here, they are associated

00:10:21.920 --> 00:10:27.040
with information density, right, nominal structures, this is how we package information,

00:10:27.040 --> 00:10:31.440
they're really economical, and then verbal structures, pronouns and adverbs as we said

00:10:31.440 --> 00:10:36.080
on the positive end, that's interactivity and involvement. It's not just me interpreting

00:10:36.080 --> 00:10:40.600
this, there's been other research before this of course that reached these same conclusions.

00:10:40.600 --> 00:10:47.480
Alright, well, so this study identified four such dimensions, so a contrast between interactive

00:10:47.480 --> 00:10:53.480
and informational style, an abstract expository versus concrete descriptive style, immediate

00:10:53.480 --> 00:10:57.560
versus removed style, this had to do with the temporal focus, so immediacy, present

00:10:57.560 --> 00:11:05.060
orientedness versus past orientedness and narration, and because the reader was removed

00:11:05.060 --> 00:11:09.520
from the narrator, right, or the narrator, it felt like the narrator was removed from

00:11:09.520 --> 00:11:16.240
the reader or the listener, that's why I chose this label. And so in this talk I'll focus,

00:11:16.240 --> 00:11:21.800
in hypothetical style, but in this talk I'll only focus on the first three dimensions.

00:11:21.800 --> 00:11:28.860
So what can we do, how do we compare the authors, right, or how do we analyze authorial style

00:11:28.860 --> 00:11:34.360
along these dimensions? So as I said, there are means plotted for each of my authors,

00:11:34.360 --> 00:11:39.440
right, the average across each author's text, there are also standard deviations, we'll

00:11:39.440 --> 00:11:45.520
come back to them later, so standard deviation shows variance across text, right, how far

00:11:45.800 --> 00:11:51.120
removed my individual data points are from the mean. So we notice that here, Gutkind

00:11:51.120 --> 00:11:55.640
and Hawkins are the two authors that form the most striking contrast, right, and we're

00:11:55.640 --> 00:12:02.640
talking only about the averages here, so for Hawkins, all of his text on average are much

00:12:02.680 --> 00:12:07.920
higher than the other author's text on the dimension, right, so the mean is higher than

00:12:07.920 --> 00:12:12.600
the other means, that means that generally this author gravitates towards the positive

00:12:12.680 --> 00:12:18.360
range of the dimension, right, towards interactivity. And then Gutkind, again, generally, right,

00:12:18.360 --> 00:12:23.000
we do see a spread, but we're talking about some general preferences here, the central

00:12:23.000 --> 00:12:29.680
trend, generally this author gravitates towards the negative range of the dimension. And so

00:12:29.680 --> 00:12:36.680
we look at some text examples, so this is a text from Gutkind, this is a text from Hawkins,

00:12:37.560 --> 00:12:43.680
and so what we notice is dense use of negative nominal features in Gutkind's prose, prepositional

00:12:43.680 --> 00:12:48.560
phrases, nominalizations, attributive adjectives, and this is actually something that Gutkind

00:12:48.560 --> 00:12:53.720
has identified himself about CNF, so when this writer talked about what the genre is

00:12:53.720 --> 00:12:59.720
for him, he said, well, the goal is to communicate information, just like a reporter, and the

00:12:59.720 --> 00:13:05.000
writer's role here is to be engaged in the world confident and informed. So we don't

00:13:05.080 --> 00:13:09.560
know whether this writer is doing this intentionally or not, but we do see alignment between our

00:13:09.560 --> 00:13:16.560
linguistic results, right, the types of features that this writer is using, and how he conceptualizes

00:13:16.560 --> 00:13:23.200
CNF, right, what he believes the goal is. And then on the other hand, Hawkins, so consistent

00:13:23.200 --> 00:13:29.320
use of first person pronouns, private verbs, frequent emphatics, right, so interactive

00:13:29.320 --> 00:13:34.760
and involved discourse. And again, we're talking about generally trends for this author's

00:13:34.920 --> 00:13:41.480
text. Second dimension, abstract expository versus concrete descriptive style, so you

00:13:41.480 --> 00:13:46.700
notice abstract nouns on the positive end, so they are responsible for the abstract part

00:13:46.700 --> 00:13:53.700
of the label, but then these verb complement clauses, this is how exposition happens, right,

00:13:55.280 --> 00:14:00.920
reasoning, explanation happens through those verb complement clauses. On the negative extreme

00:14:00.920 --> 00:14:07.520
concrete nouns, these ing and ed participial structures, they are the ones that in this

00:14:07.520 --> 00:14:14.520
kind of discourse create concreteness. Okay, so again, to look at some author contrasts,

00:14:15.080 --> 00:14:20.560
we notice that Steinberg and Tivis are the writers whose texts are exclusively in the

00:14:20.560 --> 00:14:25.800
positive and in the negative range respectively, right, all of their texts actually happen

00:14:25.960 --> 00:14:32.960
in different dimension ranges. We'll look at the texts and we clearly see, well, prevalence

00:14:35.360 --> 00:14:41.080
of negative features here, description, in Tivis' writing, and then lots of abstract

00:14:41.080 --> 00:14:47.760
expository features in Steinberg's text. So, abundance of concrete nouns, what this does

00:14:47.760 --> 00:14:53.720
in Tivis' text is, well, they denote objects and places, right, it creates a highly descriptive,

00:14:53.720 --> 00:15:00.720
highly vivid concrete text, but then in contrast, Steinberg here talks about the writing process,

00:15:02.120 --> 00:15:09.120
analytical thinking, the audience's thoughts and feelings, so features of stance, right,

00:15:09.520 --> 00:15:16.120
verb complement clauses accomplish this function, and so journalists who have analyzed Steinberg's

00:15:16.120 --> 00:15:21.000
writing have actually said, well, there's this emphasis on exposition and reasoning,

00:15:21.160 --> 00:15:28.160
so again, similar conclusions have been reached in other fields, by other scholars who work in

00:15:28.320 --> 00:15:34.560
very different research traditions, but we call this triangulation, it's always nice to

00:15:34.560 --> 00:15:40.720
see similar conclusions reached through other methods. So, they've talked about this author's

00:15:40.720 --> 00:15:47.720
preoccupation with the inner narrative, reflection, self-analysis, things like these.

00:15:48.720 --> 00:15:54.360
Okay, so the third dimension, immediate versus removed style, as I said, has to do with different

00:15:54.360 --> 00:16:01.360
temporal orientation, Hawkins and Wallace here form pretty striking contrast, we look

00:16:03.320 --> 00:16:09.800
at some text examples again, and so we see that for Hawkins, there's this focus on the

00:16:09.800 --> 00:16:14.520
past events, so the narrator here reminisces about the past events, they're very rare and

00:16:14.520 --> 00:16:21.520
very sudden switches to the present, so they're there, but quite rarely, and what these switches

00:16:23.440 --> 00:16:30.440
do is they have this effect of an aside where the reader is pulled in very briefly, so the

00:16:30.480 --> 00:16:35.080
writer here says, yeah, there's a succession of events, I screamed, but that's what you're

00:16:35.080 --> 00:16:39.720
supposed to do, right, and then they go back to the narrative, right, so it's this rapid

00:16:40.000 --> 00:16:45.440
switch, but really the overwhelming focus is on the past. For Wallace, there is a pervasive

00:16:45.440 --> 00:16:51.240
use of present tense, and when we start analyzing the feature in his writing, there's a focus

00:16:51.240 --> 00:16:56.880
on the internal experience, and literary critics have picked up on that, so Langer talks about

00:16:56.880 --> 00:17:02.640
this pure present, expression of timelessness, so they say, well, when it comes to contemplation

00:17:02.640 --> 00:17:09.640
and emotion, time is not relevant, so there is this timeless present in their words, and

00:17:10.520 --> 00:17:14.640
speaking specifically about Wallace's writing, Seguin said, the voices in Wallace are the

00:17:14.640 --> 00:17:20.600
very mark of a kind of unique inwardness or subjective style, so we wouldn't normally

00:17:20.600 --> 00:17:25.720
associate the present tense with subjectivity or inwardness, right, but when we get this

00:17:25.720 --> 00:17:30.400
prevalence, right, so the technique has shown us that the feature is so prevalent, and we

00:17:30.400 --> 00:17:34.600
start analyzing it, and it turns out that this effect that the literary critics have

00:17:34.800 --> 00:17:40.600
identified is actually achieved to a large extent through that particular feature.

00:17:40.600 --> 00:17:47.600
Okay, well, so one thing we've done is we've compared authors, right, on each of the dimensions,

00:17:48.360 --> 00:17:52.560
we've looked at some noticeable contrast, but we focused on the general tendencies,

00:17:52.560 --> 00:17:59.560
right, on the means only. What if our question is a particular person's style, right, so

00:17:59.720 --> 00:18:05.200
we want to describe a certain individual's use of language, well, so the way to do this

00:18:05.200 --> 00:18:11.880
in this approach is in multidimensional space, and that's because each dimension adds something

00:18:11.880 --> 00:18:15.800
to this description, right, so they're interrelated, but they're distinct, they're unique, and

00:18:15.800 --> 00:18:21.800
they tell us something new about language use, so if we chose a particular person, then

00:18:21.800 --> 00:18:27.280
we would want to say something about their language use with respect to each of these

00:18:27.280 --> 00:18:30.840
dimensions. So I picked just a couple of names, they're

00:18:30.840 --> 00:18:34.960
kind of big names in creative non-fiction, so here we have a ditchy style on the three

00:18:34.960 --> 00:18:40.000
dimensions, so a balance of interactive and informational text on dimension one, markedly

00:18:40.000 --> 00:18:45.920
abstract style on dimension two, each is characterized through different features, right, that's

00:18:45.920 --> 00:18:51.400
why we're saying that they're distinct, and whole dimension range from removed style to

00:18:51.400 --> 00:18:57.800
quite immediate on dimension three. This is David Foster Wallace, general informational

00:18:57.800 --> 00:19:03.280
on dimension one, a balance of abstractedness and concreteness on dimension two, highly

00:19:03.280 --> 00:19:08.640
immediate style on dimension three. Sometimes there's this nice consistency, so

00:19:08.640 --> 00:19:14.080
Joe and Didion here, general informational, generally abstract, generally removed, right,

00:19:14.080 --> 00:19:19.160
so but again, says something about this person, right, something they do consistently in their

00:19:19.160 --> 00:19:23.160
works. Okay, and one other thing that we can do,

00:19:23.160 --> 00:19:29.280
we can even go a step further and look within a particular writer, right, so what this visual

00:19:29.280 --> 00:19:36.080
here shows is that there's variation across each person's works, right, so far we've

00:19:36.080 --> 00:19:42.120
talked about general tendencies again, but we said that standard deviations are really

00:19:42.120 --> 00:19:48.080
important, right, they focus us on the, not on the mean, but on the spread of scores,

00:19:48.080 --> 00:19:53.600
and so this is what's highlighted here, that's actually the case for most of my authors.

00:19:53.600 --> 00:20:00.520
So I selected one author on each dimension who showed the most variation, and so Philip

00:20:00.520 --> 00:20:06.680
Lopate here on dimension one, again, people have seen this, right, I have talked about

00:20:06.680 --> 00:20:13.140
this tendency of Lopate to be lively and engaged on the one hand, but then also attribute importance

00:20:13.300 --> 00:20:20.300
to density of thought, right, and so this versatility has been talked about, what we

00:20:20.300 --> 00:20:25.940
see is the linguistic evidence of that. All right, but my question was, what's the

00:20:25.940 --> 00:20:31.220
basis for this linguistic variation within a single person? Could it be that this linguistic

00:20:31.220 --> 00:20:36.620
variation is explained through some communicative reason? And so I focused on a communicative

00:20:36.620 --> 00:20:42.900
parameter, so I selected communicative purpose, and I read the essays, and I coded each essay

00:20:42.900 --> 00:20:50.180
for their communicative, for its communicative purpose, okay, so identify purpose groups

00:20:50.180 --> 00:20:57.100
within this author's, or across this author's works, address a question, describe a person,

00:20:57.100 --> 00:21:01.620
narrate and reflect, review and speculate, so this is meant to show that all of these

00:21:01.620 --> 00:21:09.460
purpose groups happen within Lopate's works, right. Okay, so what I saw was that there

00:21:09.620 --> 00:21:14.660
were actually statistically significant differences between among these purpose groups, so this

00:21:14.660 --> 00:21:21.620
meant that the differences in these purpose groups predicted language use on the dimension,

00:21:21.620 --> 00:21:27.780
so different purpose groups, different essays, depending on their purpose, use the language

00:21:27.780 --> 00:21:34.940
on the dimensions, on this first dimension differently, and so I identified this communicative

00:21:34.980 --> 00:21:40.220
basis for the linguistic variation across his text this way, so we're not going to spend

00:21:40.220 --> 00:21:44.980
a lot of time on these texts, but you can see just by looking at the highlights, right,

00:21:44.980 --> 00:21:51.980
the balance shifts, right, so this essay up there, describe a person, is more involved

00:21:53.020 --> 00:21:58.260
and interactive, then we get addressing a question, right, where some features of information

00:21:58.260 --> 00:22:04.700
density start creeping in, and then review is the most informational. So I did the same

00:22:04.700 --> 00:22:10.660
for David Shields on immediate versus removed style dimension, these are the purpose groups,

00:22:10.660 --> 00:22:16.220
analyze and reflect, argue a point, narrate and review, again statistically significant

00:22:16.220 --> 00:22:21.780
differences among the groups, which meant that the language is used differently depending

00:22:21.780 --> 00:22:28.780
on what the communicative purpose of the text is, so analyze and reflect, markedly present-oriented,

00:22:29.780 --> 00:22:36.780
then arguing a point, some features of past orientation, and then of course narrate, expectedly

00:22:36.780 --> 00:22:43.140
uses features of narration, right. Okay, well the picture is a bit more complex, but maybe

00:22:43.140 --> 00:22:47.500
a bit more interesting for Ander Monson on abstract expository versus concrete descriptive

00:22:47.500 --> 00:22:54.500
style, I did the exact same thing, but I also saw as I was reading these essays, that just

00:22:54.500 --> 00:23:01.500
regardless of which group I put them in, they all had a massive amount of reflection,

00:23:01.500 --> 00:23:05.500
so they were essentially doing the same thing, every single type of text had reflection in

00:23:05.500 --> 00:23:12.060
it, and so then I didn't see differences in terms of the language that they used, so purpose

00:23:12.060 --> 00:23:16.060
didn't predict linguistic variation here, and that was not surprising to me, because

00:23:16.060 --> 00:23:22.180
well I knew at that point that reflection was everywhere, but so what I did then, I

00:23:22.180 --> 00:23:28.460
used cluster analysis, so this bottom-up way, again involving an automatic technique, to

00:23:28.460 --> 00:23:33.140
tell me maybe there's a different basis or a different way to group text, right, that

00:23:33.140 --> 00:23:39.220
I wasn't seeing as a human analyst, so cluster analysis groups these texts that are close

00:23:39.220 --> 00:23:46.220
in their linguistic dimensions course, and then I went in asking the question, well okay,

00:23:46.380 --> 00:23:52.940
these three clusters that it gave me, why are they distinct, right, so what is the basis

00:23:52.940 --> 00:23:58.500
there, they really are different, we can see here in this plot, and turns out that for

00:23:58.500 --> 00:24:05.500
Ander Monson variation on this dimension has to do with the degree of reliance on concrete

00:24:05.660 --> 00:24:12.060
illustration or rather the types of examples that this author uses, so we would need to

00:24:12.140 --> 00:24:17.980
read more of the text, but basically some texts use very concrete illustrations, right,

00:24:17.980 --> 00:24:24.100
again through specific objects, right, that type of examples, but then other texts prove

00:24:24.100 --> 00:24:30.620
their point through a discussion, right, that somehow illustrates their, drives the message

00:24:30.620 --> 00:24:37.620
home, okay, so yeah, very different strategies in terms of how argumentation happens, right,

00:24:38.620 --> 00:24:44.620
or what kind of examples this author uses, and that results in variation on this dimension.

00:24:44.620 --> 00:24:51.620
Okay, well zooming out, just to ask what did we see here, and really who is this work for,

00:24:52.620 --> 00:24:59.620
and what can we do with this, so we saw linguistic descriptions of discourse domain, right, or

00:24:59.620 --> 00:25:04.620
a linguistic account of a discourse domain, and this can be done in any domain at all,

00:25:04.620 --> 00:25:10.620
and this linguistic description was done through, or by focusing on general patterns

00:25:10.620 --> 00:25:16.260
of linguistic variation, right, dimensions based on feature co-occurrence rather than

00:25:16.260 --> 00:25:22.100
some specific linguistic feature. We looked at author differences first in the use of

00:25:22.100 --> 00:25:29.100
language, right, on those dimensions. We then said that if we're interested in a particular

00:25:29.900 --> 00:25:36.900
person's style, then we could analyze their style in multidimensional space, so with respect

00:25:36.980 --> 00:25:42.620
to the features of each of these dimensions, and then we can go further, we can look at

00:25:42.620 --> 00:25:49.620
intra-author variation, and in my study specifically, turned out that this variation was due to

00:25:50.300 --> 00:25:57.300
communicative reasons. Well, so who is this for, or is there use for this beyond these

00:25:58.220 --> 00:26:04.020
studies? Well, of course this is for researchers who care about stylistics, right, computational

00:26:04.020 --> 00:26:10.460
stylistics, they have talked about complex interplay of stylistic features as opposed

00:26:10.460 --> 00:26:15.620
to choosing some specific features as predictors of authorship, for example, so they actually

00:26:15.620 --> 00:26:19.980
say that a complex interplay of stylistic features, well, first we should acknowledge

00:26:19.980 --> 00:26:26.980
that linguistic features don't happen independently of each other, right, so they actually co-occur

00:26:27.540 --> 00:26:34.540
rather than, well, one feature at a time, but they actually point out a greater predictive

00:26:34.540 --> 00:26:41.020
power of a set of features, so people who do authorship attribution, authorship recognition,

00:26:41.020 --> 00:26:44.780
so they say, well, if we look at several features at a time, right, then that's going to be

00:26:44.780 --> 00:26:50.860
more powerful, more informative when we tell authors apart, right, when we're trying to

00:26:50.860 --> 00:26:56.640
automatically identify that. Then the people have talked about selectiveness of stylistic

00:26:56.640 --> 00:27:00.800
analysis, right, and that's based on intuition, often we decide, well, what features are we

00:27:00.800 --> 00:27:06.160
going to look at when we're going to compare authors, I guess, stance features are interesting,

00:27:06.160 --> 00:27:13.160
right, but that's our decision, so this method actually resolves that problem, right, because

00:27:13.360 --> 00:27:18.680
it identifies the features that are important for the discourse domain, and then we work

00:27:18.680 --> 00:27:24.360
with whatever we know already is important for the discourse domain. All right, but going

00:27:24.440 --> 00:27:30.940
beyond research, of course, maybe more real world applications, so this matters or should

00:27:30.940 --> 00:27:35.920
matter hopefully to writers, to teachers of creative writing, to translators, so people

00:27:35.920 --> 00:27:42.480
who actually create, produce these texts, right, or any kinds of other texts, right,

00:27:42.480 --> 00:27:49.480
that people learn to write, so it gives us an insight into the specific linguistic means

00:27:50.160 --> 00:27:54.840
that comprise a certain type of discourse, right, so if you want to achieve a certain

00:27:54.840 --> 00:28:00.840
effect, right, then these are the features that do that, and this is how they work together.

00:28:00.840 --> 00:28:06.120
Well, and then I thought, well, how about readers, how about end users of digital platforms,

00:28:06.120 --> 00:28:12.080
so all of us, and digital humanists, right, so does this, is this relevant for these other

00:28:12.080 --> 00:28:17.520
disciplines, right, these other research traditions? Well, maybe, maybe we're going to start asking

00:28:17.720 --> 00:28:22.200
questions such as, well, really, what is the fiction, non-fiction interplay there, right,

00:28:22.200 --> 00:28:26.400
because, well, these essays in large part exist on digital platforms, right, but we

00:28:26.400 --> 00:28:32.080
could ask that same question about any other digital domain. What kind of language do we

00:28:32.080 --> 00:28:38.520
produce and receive on digital platforms because of how varied their affordances are? Well,

00:28:38.520 --> 00:28:44.480
the language is similarly just as varied, right, so the language actually reflects the

00:28:44.520 --> 00:28:50.800
variety of those affordances, and how does that language vary and why, right, so what's

00:28:50.800 --> 00:28:57.800
the basis for that variation? Some references, and thank you so much.

