CS 575, Artificial Intelligence II
Lecture, January 28, 2000
Melanie Martin
 

1. Training and Testing Data
2. N-Fold Cross Validation
3. Annotation
        Example: Flame recognition in newsgroups
4. Kappa and Inter-Coder Reliability


Training and Testing Data

Recall that we want to build a classifier

        Training Data ----> Machine Learning Algorithm ---> Classifier

To build a probabilistic classifier, the ML algorithm determines a probability model
    for all f1, f2,..., fn, C      p(f1, f2,..., fn, C)

so given   (f1, f2,..., fn)  and  p(f1, f2,..., fn, C1), p(f1, f2,..., fn, C2),....., p(f1, f2,..., fn, Ck)
    we want to choose the p(f1, f2,..., fn, Cj) that is greatest

So to build the classifier we need training data, we also need to evaluate the system somehow
    i.e. Does it work???
for this we need test data.

For supervised learning, both training and test data will need answers....perhaps such data already
    exists, otherwise will have to be annotated by humans.

But first, do these two data sets really need to be separate???

Suppose you use the training data that you used to develop the classifier to test it, what happens?
    It should work very well, maybe too well and you still don't really know anything. We need to try it
    on new data (test data) if we are to learn anything meaningful.
This leads to the problem of OVERFITTING: the probability model (more generally the classifier)
    fits the training data too well and will not generalize will to other data. So we need to find a balance
    between fitting the training data well and generalizing to other data. On way to do this is:


N-Fold Cross Validation

Here we divide the training data into pseudo test and pseudo training data n times and take the model
    that does best on average.

Let's look at an example of 3-fold cross validation:
    Divide the training data into 3 equal parts: A, B, C
 
 
Experiment Pseudo training data Pseudo test data
1 A, B C
2 A, C B
3 B, C A


Annotation

Getting started:
    Problem Definition
    Plans for aquiring and annotating the data
    Plans for evaluation

A case study: distinguishing flames in newsgroups

    Given the general definition that a flame is an attack on a person on a newsgroup or their beliefs,
    that is abusive or insulting, let's look at some examples:

WARNING : These examples are taken from actual newsgroups and in no way represent the thoughts,
    opinions, or beliefs of the research group or myself. You may find some some the following offensive.
 

%%*******************************************

<ANN flame="", cert="", id="97:908.11:1" /ANN>
__Thread-ID: 11:1:2
__Thread-Members:  97:908 97:269
__Build-Date: Thu Dec  2 20:40:48 MST 1999
__Built-By: dtappan
__Recoded-From: 97:219297
__Recoded-To: 97:908
Path: news.NMSU.Edu!lynx.unm.edu!newshub.tc.umn.edu!logbridge.uoregon.edu!newsswitch.lcs.mit.edu!remarQ-easT!rQdQ!supernews.com!remarQ.com!corp.supernews.com!not-for-mail
From: edromney <romney@edromney.com>
Newsgroups: rec.photo.equipment.35mm
Subject: Re: HOW WE PRE-SCREEN OUR CUSTOMERS
Date: Thu, 30 Sep 1999 13:45:07 -0500
Organization: romney
Lines: 35
Message-ID: <37F3AFB1.5193847@edromney.com>
References: <37F39CAC.83ACAC69@edromney.com> <19990930131815.15975.00000131@ng-bk1.aol.com>
Reply-To: romney@edromney.com
X-Complaints-To: newsabuse@supernews.com
X-Mailer: Mozilla 4.04 (Macintosh; U; PPC)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; x-mac-type="54455854"; x-mac-creator="4D4F5353"
Content-Transfer-Encoding: 7bit
Xref: news.NMSU.Edu rec.photo.equipment.35mm:219297
 
 

ArtKramr wrote:

> >Subject: HOW WE PRE-SCREEN OUR CUSTOMERS
> >From: edromney romney@edromney.com
> >Date: Thu, 30 September 1999 01:23
>
> >My Christian PR on this
> >newsgroup and elsewhere, the praying hands  image in our catalog, the
> >church steeple on our web page... all these seem to attract nice good
> >Christian people... and they also make bad people avoid us.
>
> Guess you won't do business with Jews, Muslims, Buhdists or any other  people
> who are not "nice good Christians". Is that right?

Learn to spell Buddhist, you ignorant liberal  dummy....you are all nothing but
a pack of cards...

%%*******************************************

<ANN flame="", cert="", id="13:97.2:2" /ANN>
__Thread-ID: 2:2:2
__Thread-Members: 13:10 13:97
__Parent-Num: 13:10
__Build-Date: Thu Dec  2 22:30:37 MST 1999
__Built-By: dtappan
__Recoded-From: 13:60747
__Recoded-To: 13:97
Path: news.NMSU.Edu!lynx.unm.edu!newshub.tc.umn.edu!logbridge.uoregon.edu!newsfeed.direct.ca!newspeer.monmouth.com!nntp2.deja.com!nnrp1.deja.com!not-for-mail
From: smi@my-deja.com
Newsgroups: alt.news.macedonia
Subject: Re: The census in Pirin Macedonia and entrails.
Date: Fri, 01 Oct 1999 14:24:38 GMT
Organization: Deja.com - Before you buy.
Lines: 34
Message-ID: <7t2g6o$vnm$1@nnrp1.deja.com>
References: <37ECF739.7591FD12@sympatico.ca> <7sogv3$pir$1@nnrp1.deja.com> <37F03DB4.F89C06A4@erols.com> <7sthnd$cah$1@nnrp1.deja.com> <7subrd$3eug$1@newssvr03-int.news.prodigy.com> <7svip5$qoe$1@nnrp1.deja.com> <7t10hq$257c$1@newssvr04-int.news.prodigy.com>
NNTP-Posting-Host: 208.145.41.12
X-Article-Creation-Date: Fri Oct 01 14:24:38 1999 GMT
X-Http-User-Agent: Mozilla/4.02 [en] (X11; I; SunOS 5.5.1 sun4u)
X-Http-Proxy: 1.0 bx6.deja.com:80 (Squid/1.1.22) for client 208.145.41.12
X-MyDeja-Info: XMYDJUIDsmi
Xref: news.NMSU.Edu alt.news.macedonia:60747

OK, could be dangerous, all women are in certain sense:-). No, seriously
this is supposed to be a discussion forum and everyone should be open to
discussion and discussion means be factual,  not just throwing words
around. If someone's trying to be dangerous, problem stays with him.
Thnx anyway.
smi

In article <7t10hq$257c$1@newssvr04-int.news.prodigy.com>,
  "June R Harton" <JUNEHARTON@prodigy.net> wrote:
> smi@my-deja.com wrote
> >"June R Harton" <JUNEHARTON@prodigy.net> wrote:
> >>>Also thanks to those few of you who abandantly supported the things
> >>>they said with facts,
> >>Galina's 'facts' are always lies.
> >I had in mind your facts, buddy, not hers. She didn't provide any.
> >But she was not cursing around, too.
>
> OK, sorry, but that woman is dangerous. She gives the apparancy
> of niceness then sticks the knife in and twists. Not content with
that,
> she sticks her arm in and pulls out the entrails.
>
> I am NOT joking. She is evil.
>
> from:  Spirit Of The Real Makedon
>          (using June's e-mail to communicate to you)!
>
> ........The heart of Macedonia was always Greek
>
>
 

Sent via Deja.com http://www.deja.com/
Before you buy.

%%*******************************************
<ANN flame="", cert="", id="19:647.236:1" /ANN>
__Thread-ID: 236:1:2
__Thread-Members: 19:647 19:649
__Build-Date: Thu Dec  2 23:13:56 MST 1999
__Built-By: dtappan
__Recoded-From: 19:256068
__Recoded-To: 19:647
Path: news.NMSU.Edu!lynx.unm.edu!newshub.tc.umn.edu!news.eecs.umich.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.enteract.com!ix.netcom.com!news
From: toyboat*spamless*@bayarea.net (just call me "TEX")
Newsgroups: alt.punk
Subject: Re: I'm listening to
Date: Sun, 14 Feb 1999 18:55:39 GMT
Organization: Netcom
Lines: 10
Distribution: World
Expires: January 1, 2001
Message-ID: <36c715eb.3062280@nntp.ix.netcom.com>
References: <7a4gv9$79@punk.akar.hol> <9697-36C665CF-61@newsd-221.iap.bryant.webtv.net>
Reply-To: toyboat*spamnotformeargentina*@bayarea.net
NNTP-Posting-Host: ali-ca14-45.ix.netcom.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-NETCOM-Date: Sun Feb 14 12:57:43 PM CST 1999
Summary: Drivel.  Trust me.
X-Newsreader: Forte Agent 1.5/32.452
Xref: news.NMSU.Edu alt.punk:256068
 

The Nation of Ulysses - Plays Pretty for Baby

"Who ya f@#$%&' now!?!?"

-TEX
 

"Eigenlijk houd ik helemaal niet van punkmuziek"
-Sjoerd Goslinga

%%*******************************************
<ANN flame="", cert="", id="95:521.10:2" /ANN>
__Thread-ID: 10:2:3
__Thread-Members: 95:407 95:521 95:536
__Parent-Num: 95:407
__Build-Date: Thu Dec  2 20:29:49 MST 1999
__Built-By: dtappan
__Recoded-From: 95:234336
__Recoded-To: 95:521
Path: news.NMSU.Edu!lynx.unm.edu!newshub.tc.umn.edu!HSNX.callatg.com!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.cwix.com!news.airnews.net!cabal11.airnews.net!cabal1.airnews.net!news-f.iadfw.net!usenet
From: "Toneswine" <toneswine@geocities.com>
Newsgroups: rec.music.makers.marketplace,ne.forsale,alt.guitar.amps,rec.music.makers.guitar.acoustic
Subject: Re: TAYLOR 514C/KALAMAZOO AMP/MUSICMAN FA
Date: Mon, 4 Oct 1999 08:47:28 -0500
Organization: Internet Express (using Airnews.net!)
Lines: 16
Message-ID: <57599B8BA99A1A64.A3618EA383529935.F2B1815A895BBA6F@lp.airnews.net>
X-Orig-Message-ID: <7tab0j$j9c@library2.airnews.net>
References: <bugtussl-0310991752550001@10.0.2.15>
Abuse-Reports-To: newsadmin at netexpress.net to report improper postings
NNTP-Proxy-Relay: library2.airnews.net
NNTP-Posting-Time: Mon Oct  4 08:44:51 1999
NNTP-Posting-Host: !_k3k0RP"Xe%SEX (Encoded at Airnews!)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Xref: news.NMSU.Edu rec.music.makers.marketplace:234336 alt.guitar.amps:83929 rec.music.makers.guitar.acoustic:126015

$200.00 For a Kalamazoo....are you nuckn futz!

tbrc <bugtussl@world.std.com> wrote in message
news:bugtussl-0310991752550001@10.0.2.15...
> x-no-archive: yes
>
> Taylor 514 C
> http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=172012684
>
> 1966 Kalamazoo/Gibson Tube Combo Amp
> http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=171999144
>
> MusicMan Sabre 1 Project/Parts
> http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=172006820
 
 

%%*******************************************

<ANN flame="", cert="", id="4.5.25" /ANN>
From: farmerpaul@webtv.net (and am not dicke)
Subject: Re: Who is God ?
Newsgroups: soc.culture.usa
Date: Fri, 12 Nov 1999 21:35:48 -0700 (MST)
Organization: WebTV Subscriber
Path: news.NMSU.Edu!lynx.unm.edu!chaos.aoc.nrao.edu!newshost.nmt.edu!newshost.lanl.gov!logbridge.uoregon.edu!newsfeed.stanford.edu!paloalto-snf1.gtei.net!news.gtei.net!webtv.net!not-for-mail
Lines: 5
Message-ID: <29719-382CEAA4-51@storefull-626.iap.bryant.webtv.net>
References: <382B3FE5.2A4D@mindspring.com>
NNTP-Posting-Host: localhost.webtv.net
Mime-Version: 1.0 (WebTV)
Content-Type: Text/Plain; Charset=US-ASCII
Content-Transfer-Encoding: 7Bit
X-WebTV-Signature: 1
 ETAuAhUAmMfteN9NxNdubaFUvhMK4BIC7+MCFQC2kX6AURDVNXwBHYRX0wGNKIth1g==
Content-Disposition: Inline
Xref: news.NMSU.Edu soc.culture.usa:275048
 

oh no, henry. god is bill clinton to you

a gun in the hand is worth two in the bush

%%*******************************************

Some issues to deal with when defining the problem and writing instructions for annotators:
1. Included messages: when included message is a flame and current one is not and vice versa
    requires a policy decision
2. When does passionate or spirited disagreement cross the line to flames ? (ether discussion)
3. What if attack is on some one not on newsgroup and everyone agrees (attacking Clinton on a
    Repriblican newsgroup, attacking Bosnians on a Serb newsgroup, what if you can't tell if
    person being attacked is on the newsgroup)?
4. Dealing with humor and sarcasm in this context.
5. How newsgroup-specific should we be? (alt. music.hardcore, gaming groups).
6. Does profanity mean a flame?
 

More general issues:
1. Whst level to tag at: newgroup., message, sentence, phrase, word?
2. Getting the data
3. Deciding what form the data should be in and putting it in that form.
2. What kind of agreement can we get from taggers?


Kappa and Intercoder Reliability

In our search for ideal correct annotations or answers, we must account for human fallibility. The
instructions will never be perfect and no one will be able to annotate data 100% correctly, given
the instructions. In fact,  there may be room for valid disagreement between annotators about the
correct annotation. One way to deal  with this is to have more than one person annotate the data.
When this approach is taken we need to be able to  evaluate agreement amoung the annotators.

Why not just consider the percentage agreement? When might this work? What is likely to happen
in the case of  the flames annotations?

When data is skewed, for example, flames make up approximately 15% of our current newsgroup
data and a  provious annotation project (Project H 1993 ,
http://www.arch.usyd.edu.au/~fay/projecth.html) found 70 flames  in 3000  messages, one could
get good percentage agreement if all messages were tagged as the predominant category (
non-flames, in our example).

In 1960, Cohen[1] developed an agreement measure called kappa, this work has been extended
by Davies  and Fleiss[2] in 1982, for  more than two annotators.

Cohen's kappa for two annotators compares the agreement between the annotators with the
chance agreement (agreement expected if the ratings were statistically independent). This is
normalized (divided) by the maximum  possible agreement, given the chance agreement.

Let P(A) be the actual agreement,
let P(E) be the expected agreement,
then kappa = (P(A) - P(E))/ (1 - P(E))

A kappa value of 0.8 or higher indicates a high level of reliability amoung the raters, values
between 0.67 and 0.8 indicate  moderate agreement.

We are going to do some simple example to get the feel of kappa. If you need to use kappa for
a more complicated situation, see the references given below.
 
 
Tagger 1
Y N
Tagger2 Y 8 4 12
N 8 86 94
16 90 106
P(A) = (8 + 86)/ 106 = 0.8868
P(E) = (12*16 + 90*94)/(106*106) = 0.7700
kappa = 0.5078
 
 
Tagger 1
Y N
Tagger2 Y 7 0
N 1 69
77

 
 
Tagger 1
Y N
Tagger2 Y 50 18
N 19 383
470

 
 



[1] Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological
    Measurement, 20:37-46.

[2]Davies, M. & Fleiss, J. (1982). Measuring Agreement for Multimomial Data. Biometrics,
    38:1047-1051.

Bruce, R. & Wiebe, J. (1999) Recognizing Subjectivity: A Case Study in Manual Tagging. Natural
    Language Engineering, 1(1):1-16

Wiebe, J. Lecture Notes for CS 479/579, Spring 1988.
    http://www.CS.NMSU.Edu/~wiebe/courses/CL/Sp98/index.html