Dr Nic

Nicola Stokes

Postdoctoral Researcher,

School of Computer Science and Informatics,

College of Engineering Mathematical & Physical Sciences,

University College Dublin,

Belfield, Dublin 4

Ireland.

[nicola.stokes at ucd.ie]

Research Interests | Publications | Activities | Language Technology @ UniMelb  

Research Interests

I am interested in the development of robust linguistic analysis techniques (e.g. Lexical Cohesion Analysis, Textual Entailment and Paraphrase Identification, Temporal Expression Analysis) for use in NLP and IR applications such as Text Summarisation, Question Answering and Text Classification. I've recently starting working on biomedical text processing tasks such as Genomic IR and Gene Normalisation. 

 

Current Post and Academic Background

I am currently working as a researcher in the School of Computer Science and Informatics, UCD.

From 2005 to mid-2008, I was a postdoctoral research fellow in the National ICT Australia (NICTA) Victoria Lab at the University of Melbourne. During this time I worked with Steven Bird, Tim Baldwin, Lawrence Cavedon, James Bailey, Alistair Moffat, Justin Zobel and other members of the Interactive Information Discovery and Delivery project (I2D2). I2D2 has recently changed focus and is now called the BioTala project, which is part of the Network Information Processing Program at the NICTA Victoria Lab. 

I completed my PhD in 2004 under the supervision of Prof. Joe Carthy at the Intelligent Information Retrieval group in the Department of Computer Science, University College Dublin. My thesis investigates the appropriateness of using lexical cohesion analysis (provided by lexical chains) to improve IR and NLP tasks in the Topic Detection and Tracking domain. During the course of this work I focussed on three separate tasks: New Event Detection (i.e. the detection of breaking news stories as they arrive on a news stream), News Story Segmentation (i.e. the identification of boundaries between adjacent news stories in a broadcast news programme transcript) and News Story Gisting (i.e. the generation of single-sentence news story summaries) for broadcast news and newswire data. After completing my Phd, and before moving to Melbourne, I held a 1-year postdoc position in UCD where I worked with the UCD Summarisation group.

In 2001 I spent a semester at the Center for Intelligent Information Retrieval (CIIR) at UMass working with James Allan and Victor Lavrenko on the New Event Detection task. From 2002-2003 my work on News Story Segmentation and Gisting was motivated by collaborative work with Alan Smeaton and his group on the Fischlar News Stories system at the Centre for Digital Video Processing, Dublin City University. 

 

Publications

PhD Thesis 

Nicola Stokes. Applications of Lexical Cohesion Analysis in the Topic Detection and Tracking Domain. Department of Computer Science, University College Dublin, April 2004. [pdf]  [zip] 

Journal Papers

Bader Aljaber, Nicola Stokes, James Bailey, Jian Pei. Document Clustering of Scientific Texts using Citation Contexts. Information Retrieval, 13(2):101-131, 2010. [pdf]

Martina Naughton, Nicola Stokes, Joe Carthy. Sentence-level Event Classification in Unstructured Texts. Information Retrieval, 13(2):132-156, 2010. [pdf]

Robin Boutros, Nicola Stokes, Micheal Bekaert, Emma C. Teeling. UniPrime2: a web service providing easier Universal Primer Design. In Nucleic Acids Research, 2009. [full-paper][UniPrime2]

Nicola Stokes, Yi Li, Lawrence Cavedon, Justin Zobel. Exploring criteria for successful query expansion in the Genomic domain. Information Retrieval, 12:17-50, 2009. [pdf]

Nicola Stokes, Yi Li, Alistair Moffat, Jiawen Rong. An empirical study of the effects of NLP components on Geographic IR performance. In the special issue on Geographic Information Retrieval, International Journal of Geographical Information Science, 22(3):247-264, 2008. [pdf]

Eamonn Newman, Joe Carthy, John Dunnion, Nicola Stokes. Identifying Semantic Equivalence for Multi-document Summarisation. Artifical Intelligence Review, 25(1-2):55-65, 2006. [pdf]

Nicola Stokes, Joe Carthy, Alan F. Smeaton. SeLeCT: A Lexical Cohesion based News Story Segmentation System. In the Journal of AI Communications, 17(1):3-12, 2004. [pdf]

Book Reviews

Nicola Stokes. William Hersh: Information retrieval: a health and biomedical perspective, 3rd ed - Book Review. Information Retrieval, published online June 2009.

Nicola Stokes. TREC: Experiment and Evaluation in Information Retrieval - Book Review. Computational Linguistics, Vol. 32, No. 4, pp. 563-567, 2006.

Conference and Workshop Papers

2008

Bader Aljaber, Nicola Stokes, James Bailey, Yi Li. Exploring the benefit of contextual information for boosting TREC Genomic IR performance. To appear in the proceedings of the Australasian Document Computing Symposium (ADCS), 2008. [pdf]

Martina Naughton, Nicola Stokes, Joe Carthy. Investigation Statistical Techniques for Sentence-level Event Classification. In the proceedings of Coling 2008. [pdf]

2007

Nicola Stokes, Yi Li, Lawrence Cavedon, Justin Zobel. Exploring abbreviation expansion for Genomic Information Retrieval. In the proceedings of the Australasian Language Technology Workshop, 2007. [Best Paper Award] [pdf]

Nicola Stokes, Yi Li, Lawrence Cavedon, Eric Huang, Jiawen Rong, Justin Zobel. Entity-based relevance feedback for genomic list answer retrieval. In the proceedings of the TREC Genomics Track, 2007. [pdf]

Benjamin Goudey, Nicola Stokes, David Martinez. Exploring extensions to machine learning-based Gene Normalisation. In the Proceedings of the Australasian Language Technology Workshop, 2007. [pdf]

Yi Li, Nicola Stokes, Lawrence Cavedon, Alistair Moffat. NICTA I2D2 Group at GeoCLEF 2006. In Evaluation of Multilingual and Multi-modal Information Retrieval, LNCS, Springer, Vol. 4730/2007, pp. 938-945, 2007. [pdf]

Nicola Stokes, Jiawen Rong, Lawrence Cavedon. NICTA's Update and Question-based Summarisation Systems at DUC 2007. In the Proceedings of the Document Understanding Conference Workshop, 2007. [pdf]

2006

Yi Li, Nicola Stokes, Lawrence Cavedon, Alistair Moffat. NICTA I2D2 Group at GeoCLEF 2006.In the Proceedings of the GeoCLEF Workshop on Geo-Spatial IR, Alicante, Spain, 2006. [pdf]

Yi Li, Alistair Moffat, Nicola Stokes, Lawrence Cavedon. Exploring probabilistic toponym resolution for geographical information retrieval. In the Proceedings of SIGIR Workshop on Geographical Information Retrieval, pages 17--22, 2006. [pdf]

Jeremy Nicholson, Nicola Stokes, Tim Baldwin. Detecting entailment using an extended implementation of the basic elements overlap metric. In the Proceedings of the Second Pascal Recognising Textual Entailment Challenge (RTE2), Venice, pp. 122-7, 2006. [pdf]

Eamonn Newman, Nicola Stokes, Joe Carthy, John Dunnion. Textual Entailment Recognition Using a Linguistically-Motivated Decision Tree Classifier. Machine Learning Challenges (First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Revised Selected Papers), Lecture Notes in Computer Science, Springer, pp. 372-384, 2006. [pdf]

2005

Nicola Stokes, Eamonn Newman. Multi-document Summarisation and the PASCAL Textual Entailment Challenge.In the proceedings of the Australasian Language Technology Workshop 2005 (ALTW 2005), Australasian Language Technology Association. pp. 215-223, 2005. [pdf]

Eamonn Newman, Nicola Stokes, Joe Carthy, John Dunnion. UCD IIRG Approach to the Textual Entailment Challenge. In the Proceedings of the PASCAL Recognising Textual Entailment Challenge, April 2005. [pdf]

Ruichao Wang, Nicola Stokes, William Doran, Eamonn Newman, Joe Carthy, John Dunnion. Comparing Topiary-style approaches to Headline Generation. In the Proceedings of the 27th European Conference on Information Retrieval (ECIR-05), Santiago de Compstela, Spain, March 2005. [pdf]

Ruichao Wang, Nicola Stokes, William Doran, Eamonn Newman, Joe Carthy, John Dunnion. News Headline Generation Based on Linguistic Methods. In the Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2005), Innsbruck, Austria, January 2005.

Ruichao Wang, Nicola Stokes, William Doran, Eamonn Newman, John Dunnion, Joe Carthy. LexTrim: A Lexical Cohesion based Approach to Parse-and-Trim Style Headline Generation. In the Proceedings of the 6th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2005), Mexico City, January 2005. [Best Poster Award] [pdf]

2004

Eamonn Newman, William Doran, Nicola Stokes, Joe Carthy, John Dunnion. Examination of Similarity Metrics for Redundancy Removal in Multi-Document Summarisation. In the Proceedings of the. 15th AICS Conference, pp. 292 - 301, Castlebar, Co. Mayo, September, 2004.

Eamonn Newman, William Doran, Nicola Stokes, Joe Carthy, John Dunnion. Comparing Redundancy Removal Techniques for Multi-document Summarisation. In the Proceedings of STAIRS, pp. 223-228, August 2004. [pdf]

William P. Doran, Nicola Stokes, Eamonn Newman, John Dunnion, Joe Carthy. A Hybrid Statistical/Linguistic approach to News Story Gisting. In the Proceedings of the 27th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 464-465, July 2004. [pdf]

William P. Doran, Nicola Stokes, Eamonn Newman, John Dunnion, Joe Carthy, Fergus Toolan. News Story Gisting at University College Dublin. In the Proceedings of the Document Understanding Conference (DUC), 2004.  [pdf]

Nicola Stokes, Eamonn Newman, Joe Carthy, Alan F. Smeaton. Broadcast News Gisting using Lexical Cohesion Analysis. In the Proceedings of the 26th European Conference on Information Retrieval (ECIR-04), pp. 209-222, Sunderland, U.K., 2004. [pdf]

William P. Doran, Nicola Stokes, John Dunnion, Joe Carthy. Assessing the Impact of Lexical Chain Scoring Methods and Sentence Extraction Schemes on Summarization. In the Proceedings of the 5th International conference on Intelligent Text Processing and Computational Linguistics CICLing-2004, 2004. [pdf]

William P. Doran, Nicola Stokes, John Dunnion, Joe Carthy. Comparing Lexical Chain-based Summarisation Approaches using an Extrinsic Evaluation. In the Proceedings of the Global WordNet Conference(GWC 2004), 2004. [pdf]

2003

Nicola Stokes. Spoken and Written News Story Segmentation using Lexical Chaining. In the Proceedings of the Student Workshop at HLT-NAACL 2003, Companion Volume, pp. 49-54, Edmonton, Canada, 2003. [pdf]

2002

Nicola Stokes, Joe Carthy, Alan F. Smeaton. Segmenting Broadcast News Streams using Lexical Chaining. In the Proceedings of STAIRS 2002, Vol.1, IOS Press, Ed. T. Vidal and P. Liberatore, pp. 145-154. Lyons, France, 2002. [Best Paper Award] See journal paper (Stokes et al., 2004).

2001

Nicola Stokes, Joe Carthy. Combining Semantic and Syntactic Document Classifiers to Improve First Story Detection. In the Proceedings of the 24th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 424-425, 2001. [pdf]

Nicola Stokes, Joe Carthy. Using Data Fusion to Improve First Story Detection. In the Proceedings of the 23rd BCS-IRSG European Colloquium IR Research, pp. 78-90, 2001.

Nicola Stokes, Joe Carthy. First Story Detection using a Composite Document Representation. In the Proceedings of HLT 2001, Human Language Technology Conference, pp. 134-141, 2001. [pdf]

2000

Nicola Stokes, Paula Hatch, Joe Carthy. Lexical Chaining for Web-Based Retrieval of Breaking News. In the Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems AH2000, pp. 327-330, 2000. 

Nicola Stokes, Paula Hatch, Joe Carthy. Lexical Semantic Relatedness and Online News Event Detection. In the Proceedings of the 23rd Annual ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 324-325, 2000.  [pdf]

Nicola Stokes, Paula Hatch, Joe Carthy. Topic Detection, a new application for lexical chaining?  In the Proceedings of the 22nd BCS IRSG Colloquium, pp. 94-103, 2000. [pdf]

 

Awards

  • Best Paper Award at the Australasian Language Technology Workshop, 2007.

  • Best Poster Award at the International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Mexico City, 2005.

  • Presenter at Science Uncovered 2005 at UCD 150 celebrations. [Press Coverage]

  • Best Paper Award at the Starting AI Researchers Symposium (STAIRS 2002), Lyon, France.

Activities

Organising Committee Member
  • Area Chair for Information Extraction and Summarisation theme, SIGIR 2009.

  • Technical Chair, Australian Language Technology Workshop 2008

  • Local Chair, Australian Language Technology Workshop 2007

  • ACL/HCSNet Advanced Program in Natural Language Processing, July 10th-14th 2006. Co-chair (with Steven Bird)

  • HLT/NAACL 2004 Student Workshop Workshop co-chair

  • IMTP-04 (First International Workshop on Incident Management: Theory and Practice) 

Programme Committee Member

Journal Reviewer
  • Journal of Information Retrieval

  • ACM TSLP Transactions on Speech and Language Processing

  • Journal of Language Resources and Evaluation

  

                                                                                                                      Last updated  30/04/09