Tom O'Hara Shell Script Collection
This directory contains a distribution of my collection of useful
shell script utilities, predominantly in Perl. Most of these have
been developed at NMSU, such as in support of my thesis work (O'Hara
2005). Also see Thesis code.
Several of the scripts were developed in support of the GraphLing
project directed by Janyce Wiebe and Rebecca Bruce. However, some
were developed or refined at Cycorp (and thus are partially
copyrighted).
Links to individual scripts are shown below, along with a brief
description. In addition, a tar archive of the collection is
available:
tpo-useful-scripts.tar.gz.
The scripts are free in the GNU sense (see license below). However, if
you find the scripts useful, please make a small donation to the NMSU
Computer Science Department (or not so small, if really really
useful).
Tom O'Hara
Summer 2006
Thesis info
O'Hara (2005),
Empirical Acquisition of Conceptual Distinctions via Dictionary Definitions, PhD Dissertation, New Mexico State University, 2005.
Disclaimer
All code is freely available in the GNU sense:
Copyright (c) 2005 Tom O'Hara and New Mexico State University
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
See GNU_public_license.txt for details.
anova.lisp: xlispstat source for running ANOVA analysis over two sets of data
belief_network.perl: Perl module for producing belief networks in the formats required for BELIEF and MSBN (Microsoft Belief Network).
calc_cooccurrence.perl: Calculate co-occurrence statistics for the input, each line of which represents a 2x2 contingency table.
calc_entropy.perl: calculates entropy of a class using the output of a frequency tabulation program (in particular count_it.perl).
calc_multi_x2.perl: Calculate chi-square & G2 statistics for each line of the input, each representing a 2x2 contingency table.
calc_x2.perl: Calculates a simple X^2 (chi-square) independence test for a 2x2 contingency table
check_errors.perl: Scan the script execution trace log for errors, warnings and other suspicious results.
cmd.sh: run's the specified command, capturing stderr
common.perl: Commonly used perl routines (variables & constants)
count_it.perl: script to count the occurrences of a pattern in the input
cut.perl: cut columns like the cut utility but w/o line restrictions
derive_contexts.perl: Derive corpus-based contexts for each of the categories from Roget's Thesaurus
disambiguate_text.perl: Disambiguate the words in a text by finding the Roget's category that best matches the text's context and then selecting the sense that applies in the context of a Roget's category.
do_tex.sh: front-end script for running TeX optionally w/ BibTex
dobackup.sh: Make a copy of the specified files (in the current directory) into the ./backup subdirectory
extra.perl: common module with extraneous but occasionally useful functions
extract_lex_rels.perl: given a part-of-seech tagged file of dictionary definitions extract the lexical relations implied, using pattern matching (eg., 'GenusNP PASTPART ..
extract_synset_freq.perl: Extract the frequency for the WordNet synsets from the semantic concordance (SemCor) sense counts (cntlist).
feature2ml.perl: convert from tab-delimited feature specification to format used in the ML repository (c4.5-based)
filter_mail.perl: Ad hoc script to extract mail messages from a series of mail files
k_vec.perl: program to compute k-vectors for use in parallel corpus alignment.
kill_em.sh: script for killing processes specified by name or pattern
l_vec.perl: program to compute l-vectors for use in parallel corpus alignment.
misc_xlispstat.lisp: Miscellaneous supporting routines for using xlispstat for basic statistical analyses.
paste.perl: Utility similar to Unix paste command for joining columns
perlgrep.perl: quick n' dirty version of (e)grep via perl regex matching
prep_brill.perl: preprocess a file prior to part-of-speech tagging via Brill Tagger
ps_mine.sh: show processes belonging to particular user note: the processes are shown sorted by CPU and then by memory
qd_eval.perl: quick & dirty evaluation of align_word.perl results
rcsput.perl: wrapper around rcs ci that handles locking
rename_files.perl: script for renaming a list of files by doing a simple pattern replacement (old to new), assuming Unix (or that the DOS version of the mv command is available).
roget.perl: misc script for working the Roget categories
round.perl: rounds all numbers in the text (in place)
rup.sh: simulated rup using uptime
set_xterm_title.bash: Sets the xterm title bar to the string given on the command line, including support for CygWin (via cmd /c title).
show_directory_tree.perl: recursively traverses the directory structure, creating GraphViz graph displaying the subdirectory names, including support for symbolic links.
sum_file.perl: script to sum the first column of numbers in a file with support for optional statistical analysis via xlispstat
synch_directories.perl: synchronizes the files in two directories roughly in the manner of the Windows-based briefcase facility
test-perl-examples.perl: Script for performing unit tests on functions in perl scripts by checking for examples flagged with 'EX:' in the function header.
testwn.perl: script for testing the WordNet access module
web_freq.perl: check frequency of phrase on the web via search engine queries, currently Hotbot or Altavista
win32.perl: module for interfacing w/ the Win32 module
wordnet.perl: module for accessing the WordNet knowledge base (i.e., WordNet access functions)
xlispstat.sh: invokes the right version of xlispstat for the platform (Linux or Sun Solaris), using the required resource/working storage file (xlisp.wks)