1. Training and Testing Data
2. N-Fold Cross Validation
3. Annotation
Example: Flame recognition
in newsgroups
4. Kappa and Inter-Coder Reliability
Training and Testing Data
Recall that we want to build a classifier
Training Data ----> Machine Learning Algorithm ---> Classifier
To build a probabilistic classifier, the ML algorithm determines a probability
model
for all f1, f2,..., fn, C
p(f1, f2,..., fn, C)
so given (f1, f2,..., fn) and p(f1, f2,...,
fn, C1), p(f1, f2,..., fn, C2),....., p(f1, f2,..., fn, Ck)
we want to choose the p(f1, f2,..., fn, Cj) that
is greatest
So to build the classifier we need training data, we also need to evaluate
the system somehow
i.e. Does it work???
for this we need test data.
For supervised learning, both training and test data will need answers....perhaps
such data already
exists, otherwise will have to be annotated by humans.
But first, do these two data sets really need to be separate???
Suppose you use the training data that you used to develop the classifier
to test it, what happens?
It should work very well, maybe too well and you
still don't really know anything. We need to try it
on new data (test data) if we are to learn anything
meaningful.
This leads to the problem of OVERFITTING: the probability model (more
generally the classifier)
fits the training data too well and will not generalize
will to other data. So we need to find a balance
between fitting the training data well and generalizing
to other data. On way to do this is:
N-Fold Cross Validation
Here we divide the training data into pseudo test and pseudo training
data n times and take the model
that does best on average.
Let's look at an example of 3-fold cross validation:
Divide the training data into 3 equal parts: A,
B, C
Experiment | Pseudo training data | Pseudo test data |
1 | A, B | C |
2 | A, C | B |
3 | B, C | A |
Annotation
Getting started:
Problem Definition
Plans for aquiring and annotating the data
Plans for evaluation
A case study: distinguishing flames in newsgroups
Given the general definition that a flame is an
attack on a person on a newsgroup or their beliefs,
that is abusive or insulting,
let's look
at some examples:
WARNING : These examples are taken from actual newsgroups and
in no way represent the thoughts,
opinions, or beliefs of the research group or myself.
You may find some some the following offensive.
%%*******************************************
<ANN flame="", cert="", id="97:908.11:1" /ANN>
__Thread-ID: 11:1:2
__Thread-Members: 97:908 97:269
__Build-Date: Thu Dec 2 20:40:48 MST 1999
__Built-By: dtappan
__Recoded-From: 97:219297
__Recoded-To: 97:908
Path: news.NMSU.Edu!lynx.unm.edu!newshub.tc.umn.edu!logbridge.uoregon.edu!newsswitch.lcs.mit.edu!remarQ-easT!rQdQ!supernews.com!remarQ.com!corp.supernews.com!not-for-mail
From: edromney <romney@edromney.com>
Newsgroups: rec.photo.equipment.35mm
Subject: Re: HOW WE PRE-SCREEN OUR CUSTOMERS
Date: Thu, 30 Sep 1999 13:45:07 -0500
Organization: romney
Lines: 35
Message-ID: <37F3AFB1.5193847@edromney.com>
References: <37F39CAC.83ACAC69@edromney.com> <19990930131815.15975.00000131@ng-bk1.aol.com>
Reply-To: romney@edromney.com
X-Complaints-To: newsabuse@supernews.com
X-Mailer: Mozilla 4.04 (Macintosh; U; PPC)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; x-mac-type="54455854";
x-mac-creator="4D4F5353"
Content-Transfer-Encoding: 7bit
Xref: news.NMSU.Edu rec.photo.equipment.35mm:219297
ArtKramr wrote:
> >Subject: HOW WE PRE-SCREEN OUR CUSTOMERS
> >From: edromney romney@edromney.com
> >Date: Thu, 30 September 1999 01:23
>
> >My Christian PR on this
> >newsgroup and elsewhere, the praying hands image in our catalog,
the
> >church steeple on our web page... all these seem to attract nice
good
> >Christian people... and they also make bad people avoid us.
>
> Guess you won't do business with Jews, Muslims, Buhdists or any other
people
> who are not "nice good Christians". Is that right?
Learn to spell Buddhist, you ignorant liberal dummy....you are
all nothing but
a pack of cards...
%%*******************************************
<ANN flame="", cert="", id="13:97.2:2" /ANN>
__Thread-ID: 2:2:2
__Thread-Members: 13:10 13:97
__Parent-Num: 13:10
__Build-Date: Thu Dec 2 22:30:37 MST 1999
__Built-By: dtappan
__Recoded-From: 13:60747
__Recoded-To: 13:97
Path: news.NMSU.Edu!lynx.unm.edu!newshub.tc.umn.edu!logbridge.uoregon.edu!newsfeed.direct.ca!newspeer.monmouth.com!nntp2.deja.com!nnrp1.deja.com!not-for-mail
From: smi@my-deja.com
Newsgroups: alt.news.macedonia
Subject: Re: The census in Pirin Macedonia and entrails.
Date: Fri, 01 Oct 1999 14:24:38 GMT
Organization: Deja.com - Before you buy.
Lines: 34
Message-ID: <7t2g6o$vnm$1@nnrp1.deja.com>
References: <37ECF739.7591FD12@sympatico.ca> <7sogv3$pir$1@nnrp1.deja.com>
<37F03DB4.F89C06A4@erols.com> <7sthnd$cah$1@nnrp1.deja.com> <7subrd$3eug$1@newssvr03-int.news.prodigy.com>
<7svip5$qoe$1@nnrp1.deja.com> <7t10hq$257c$1@newssvr04-int.news.prodigy.com>
NNTP-Posting-Host: 208.145.41.12
X-Article-Creation-Date: Fri Oct 01 14:24:38 1999 GMT
X-Http-User-Agent: Mozilla/4.02 [en] (X11; I; SunOS 5.5.1 sun4u)
X-Http-Proxy: 1.0 bx6.deja.com:80 (Squid/1.1.22) for client 208.145.41.12
X-MyDeja-Info: XMYDJUIDsmi
Xref: news.NMSU.Edu alt.news.macedonia:60747
OK, could be dangerous, all women are in certain sense:-). No, seriously
this is supposed to be a discussion forum and everyone should be open
to
discussion and discussion means be factual, not just throwing
words
around. If someone's trying to be dangerous, problem stays with him.
Thnx anyway.
smi
In article <7t10hq$257c$1@newssvr04-int.news.prodigy.com>,
"June R Harton" <JUNEHARTON@prodigy.net> wrote:
> smi@my-deja.com wrote
> >"June R Harton" <JUNEHARTON@prodigy.net> wrote:
> >>>Also thanks to those few of you who abandantly supported the things
> >>>they said with facts,
> >>Galina's 'facts' are always lies.
> >I had in mind your facts, buddy, not hers. She didn't provide any.
> >But she was not cursing around, too.
>
> OK, sorry, but that woman is dangerous. She gives the apparancy
> of niceness then sticks the knife in and twists. Not content with
that,
> she sticks her arm in and pulls out the entrails.
>
> I am NOT joking. She is evil.
>
> from: Spirit Of The Real Makedon
> (using June's
e-mail to communicate to you)!
>
> ........The heart of Macedonia was always Greek
>
>
Sent via Deja.com http://www.deja.com/
Before you buy.
%%*******************************************
<ANN flame="", cert="", id="19:647.236:1" /ANN>
__Thread-ID: 236:1:2
__Thread-Members: 19:647 19:649
__Build-Date: Thu Dec 2 23:13:56 MST 1999
__Built-By: dtappan
__Recoded-From: 19:256068
__Recoded-To: 19:647
Path: news.NMSU.Edu!lynx.unm.edu!newshub.tc.umn.edu!news.eecs.umich.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.enteract.com!ix.netcom.com!news
From: toyboat*spamless*@bayarea.net (just call me "TEX")
Newsgroups: alt.punk
Subject: Re: I'm listening to
Date: Sun, 14 Feb 1999 18:55:39 GMT
Organization: Netcom
Lines: 10
Distribution: World
Expires: January 1, 2001
Message-ID: <36c715eb.3062280@nntp.ix.netcom.com>
References: <7a4gv9$79@punk.akar.hol> <9697-36C665CF-61@newsd-221.iap.bryant.webtv.net>
Reply-To: toyboat*spamnotformeargentina*@bayarea.net
NNTP-Posting-Host: ali-ca14-45.ix.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-NETCOM-Date: Sun Feb 14 12:57:43 PM CST 1999
Summary: Drivel. Trust me.
X-Newsreader: Forte Agent 1.5/32.452
Xref: news.NMSU.Edu alt.punk:256068
The Nation of Ulysses - Plays Pretty for Baby
"Who ya f@#$%&' now!?!?"
-TEX
"Eigenlijk houd ik helemaal niet van punkmuziek"
-Sjoerd Goslinga
%%*******************************************
<ANN flame="", cert="", id="95:521.10:2" /ANN>
__Thread-ID: 10:2:3
__Thread-Members: 95:407 95:521 95:536
__Parent-Num: 95:407
__Build-Date: Thu Dec 2 20:29:49 MST 1999
__Built-By: dtappan
__Recoded-From: 95:234336
__Recoded-To: 95:521
Path: news.NMSU.Edu!lynx.unm.edu!newshub.tc.umn.edu!HSNX.callatg.com!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.cwix.com!news.airnews.net!cabal11.airnews.net!cabal1.airnews.net!news-f.iadfw.net!usenet
From: "Toneswine" <toneswine@geocities.com>
Newsgroups: rec.music.makers.marketplace,ne.forsale,alt.guitar.amps,rec.music.makers.guitar.acoustic
Subject: Re: TAYLOR 514C/KALAMAZOO AMP/MUSICMAN FA
Date: Mon, 4 Oct 1999 08:47:28 -0500
Organization: Internet Express (using Airnews.net!)
Lines: 16
Message-ID: <57599B8BA99A1A64.A3618EA383529935.F2B1815A895BBA6F@lp.airnews.net>
X-Orig-Message-ID: <7tab0j$j9c@library2.airnews.net>
References: <bugtussl-0310991752550001@10.0.2.15>
Abuse-Reports-To: newsadmin at netexpress.net to report improper postings
NNTP-Proxy-Relay: library2.airnews.net
NNTP-Posting-Time: Mon Oct 4 08:44:51 1999
NNTP-Posting-Host: !_k3k0RP"Xe%SEX (Encoded at Airnews!)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Xref: news.NMSU.Edu rec.music.makers.marketplace:234336 alt.guitar.amps:83929
rec.music.makers.guitar.acoustic:126015
$200.00 For a Kalamazoo....are you nuckn futz!
tbrc <bugtussl@world.std.com> wrote in message
news:bugtussl-0310991752550001@10.0.2.15...
> x-no-archive: yes
>
> Taylor 514 C
> http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=172012684
>
> 1966 Kalamazoo/Gibson Tube Combo Amp
> http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=171999144
>
> MusicMan Sabre 1 Project/Parts
> http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=172006820
%%*******************************************
<ANN flame="", cert="", id="4.5.25" /ANN>
From: farmerpaul@webtv.net (and am not dicke)
Subject: Re: Who is God ?
Newsgroups: soc.culture.usa
Date: Fri, 12 Nov 1999 21:35:48 -0700 (MST)
Organization: WebTV Subscriber
Path: news.NMSU.Edu!lynx.unm.edu!chaos.aoc.nrao.edu!newshost.nmt.edu!newshost.lanl.gov!logbridge.uoregon.edu!newsfeed.stanford.edu!paloalto-snf1.gtei.net!news.gtei.net!webtv.net!not-for-mail
Lines: 5
Message-ID: <29719-382CEAA4-51@storefull-626.iap.bryant.webtv.net>
References: <382B3FE5.2A4D@mindspring.com>
NNTP-Posting-Host: localhost.webtv.net
Mime-Version: 1.0 (WebTV)
Content-Type: Text/Plain; Charset=US-ASCII
Content-Transfer-Encoding: 7Bit
X-WebTV-Signature: 1
ETAuAhUAmMfteN9NxNdubaFUvhMK4BIC7+MCFQC2kX6AURDVNXwBHYRX0wGNKIth1g==
Content-Disposition: Inline
Xref: news.NMSU.Edu soc.culture.usa:275048
oh no, henry. god is bill clinton to you
a gun in the hand is worth two in the bush
%%*******************************************
Some issues to deal with when defining the problem and writing instructions
for annotators:
1. Included messages: when included message is a flame and current
one is not and vice versa
requires a policy decision
2. When does passionate or spirited disagreement cross the line to
flames ? (ether discussion)
3. What if attack is on some one not on newsgroup and everyone agrees
(attacking Clinton on a
Repriblican newsgroup, attacking Bosnians on a Serb
newsgroup, what if you can't tell if
person being attacked is on the newsgroup)?
4. Dealing with humor and sarcasm in this context.
5. How newsgroup-specific should we be? (alt. music.hardcore, gaming
groups).
6. Does profanity mean a flame?
More general issues:
1. Whst level to tag at: newgroup., message, sentence, phrase, word?
2. Getting the data
3. Deciding what form the data should be in and putting it in that
form.
2. What kind of agreement can we get from taggers?
Kappa and Intercoder Reliability
In our search for ideal correct annotations or answers, we must account
for human fallibility. The
instructions will never be perfect and no one will be able to annotate
data 100% correctly, given
the instructions. In fact, there may be room for valid disagreement
between annotators about the
correct annotation. One way to deal with this is to have more
than one person annotate the data.
When this approach is taken we need to be able to evaluate agreement
amoung the annotators.
Why not just consider the percentage agreement? When might this work?
What is likely to happen
in the case of the flames annotations?
When data is skewed, for example, flames make up approximately 15% of
our current newsgroup
data and a provious annotation project (Project H 1993 ,
http://www.arch.usyd.edu.au/~fay/projecth.html) found 70 flames
in 3000 messages, one could
get good percentage agreement if all messages were tagged as the predominant
category (
non-flames, in our example).
In 1960, Cohen[1] developed an agreement measure called kappa, this
work has been extended
by Davies and Fleiss[2] in 1982, for more than two annotators.
Cohen's kappa for two annotators compares the agreement between the
annotators with the
chance agreement (agreement expected if the ratings were statistically
independent). This is
normalized (divided) by the maximum possible agreement, given
the chance agreement.
Let P(A) be the actual agreement,
let P(E) be the expected agreement,
then kappa = (P(A) - P(E))/ (1 - P(E))
A kappa value of 0.8 or higher indicates a high level of reliability
amoung the raters, values
between 0.67 and 0.8 indicate moderate agreement.
We are going to do some simple example to get the feel of kappa. If
you need to use kappa for
a more complicated situation, see the references given below.
Tagger 1 | ||||
Y | N | |||
Tagger2 | Y | 8 | 4 | 12 |
N | 8 | 86 | 94 | |
16 | 90 | 106 |
Tagger 1 | ||||
Y | N | |||
Tagger2 | Y | 7 | 0 | |
N | 1 | 69 | ||
77 |
Tagger 1 | ||||
Y | N | |||
Tagger2 | Y | 50 | 18 | |
N | 19 | 383 | ||
470 |
[2]Davies, M. & Fleiss, J. (1982). Measuring Agreement for Multimomial
Data. Biometrics,
38:1047-1051.
Bruce, R. & Wiebe, J. (1999) Recognizing Subjectivity: A Case Study
in Manual Tagging. Natural
Language Engineering, 1(1):1-16
Wiebe, J. Lecture Notes for CS 479/579, Spring 1988.
http://www.CS.NMSU.Edu/~wiebe/courses/CL/Sp98/index.html