[Comp-neuro] [Big Data NLP Workshop @ IEEE Big Data 2016] Call For Papers

Teng Teck Hou dengdehao at gmail.com
Wed Aug 24 16:20:50 CEST 2016

Big Data and Natural Language Processing workshop hosted at IEEE Big Data



The modality of textual data has been somewhat under-represented in big data
and data science research thus far. This is despite the fact that large
amounts of data are stored in unstructured textual format. We intend that
this workshop will address this shortcoming and bring together academic and
industrial researchers to exchange cutting edge research in the emerging
area of extremely large-scale natural language processing (NLP). This topic
has emerged in several areas in parallel in recent years: information
retrieval and search engines, text mining, machine learning, web-derived
corpus/computational linguistics, digital libraries, high performance and
parallel computing. Common to all these areas is some or all of the main
parts of the NLP pipeline: collection, cleaning, annotation, indexing,
storage, retrieval and analysis of voluminous quantities of naturally
occurring language data from the web or large-scale national and
international digitisation initiatives. By hosting this event at IEEE Big
Data 2016, we hope to encourage the communities to come together to consider
synergies between NLP and data science.


In this context, numerous issues should be considered including those linked
to the five Vs of big data: (a) Volume: is having more data for training and
testing NLP techniques always better? (b) Variety: are all types of data
available on a sufficiently large scale? (c) Velocity: how are parallel
methods best applied to carry out NLP on a large scale? (d) Variability: how
does inconsistent data impact on the accuracy of NLP techniques? (e)
Veracity: how does the accuracy of data affect inferences that can be drawn
from it?


Research topics:

Topics covered by the workshop include, but are not restricted to, the

Application focused papers e.g. security informatics

Crowdsourcing approaches to large-scale language analysis

Use of big data to train/test methods for low resource languages where
existing NLP approaches do not exist

Efficient NLP for analysing large data sets

Challenges of scaling the NLP pipeline

Big Data Management for NLP

Storage and access for large linguistic data sets

Language processing via GPGPUs

Parallel and distributed computing techniques for language analysis e.g.
HPC, MapReduce, Hadoop, Spark and cloud based machine learning

Visualisation methods for the analysis of large corpora



Oct 3, 2016: Due date for full workshop papers submission

Oct 25, 2016: Notification of paper acceptance to authors

Nov 8, 2016: Camera-ready of accepted papers

Dec 5-8, 2016: Workshops


Program Chairs:

Dr Paul Rayson (Lancaster University, UK)

Dr Mark Stevenson (Sheffield University, UK)

Dr John Mariani (Lancaster University, UK)

Dr Laura Irina Rusu (IBM Research Australia)

Gandhi Sivakumar (Watson CoC, IBM Australia)


Program committee members:

Dr Nikos Aletras (Amazon UK)

Dr Enrique Alfonseca (Google Zurich)

Professor Laurence Anthony (Waseda University, Japan)

Dr Piotr Banski (IDS-Mannheim, Germany)

Dr Alistair Baron (Lancaster University, UK)

Dr Eddie Bell (Lyst, UK)

Matt Coole (Lancaster University, UK)

Professor John Keane (University of Manchester, UK)

Dr Dawn Knight (Cardiff University, UK)

Dr Marc Kupietz (IDS-Mannheim, Germany)

Dr Jochen Leidner (Thomson Reuters, UK)

Dr Diana Maynard (Sheffield University, UK)

Dr Rao Muhammad Adeel Nawab (COMSATS, Pakistan)

Dr Sebastian Riedel (UCL, UK)

Dr Mahsa Salehi (IBM Research, Australia)

Dr Irena Spasic (Cardiff University, UK)

Dr Stephen Wattam (Lancaster University, UK)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tnb.ua.ac.be/pipermail/comp-neuro/attachments/20160824/eca010a3/attachment.html>

More information about the Comp-neuro mailing list