Named Entity Recognition: Difference between revisions
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
For instance, the widely used Stanford Named Entity Recognizer <ref>https://nlp.stanford.edu/software/CRF-NER.html</ref> uses CRFs. | For instance, the widely used Stanford Named Entity Recognizer <ref>https://nlp.stanford.edu/software/CRF-NER.html</ref> uses CRFs. | ||
In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of architectures making use of Long Short-Term Nemory Neural Networks (LSTMNNs)<ref>http://www.aclweb.org/anthology/W/W03/W03-0426.pdf</ref>. LSTMNNs are often used in conjunction with CRFs and allow to obtain a performance equivalent to the aforementioned methods <ref name="a">Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer. 2016. [https://arxiv.org/abs/1603.01360 Neural Architectures for Named Entity Recognition].</ref><ref>Zhiheng Huang, Wei Xu, Kai Yu. 2015. [https://arxiv.org/abs/1508.01991 Bidirectional {LSTM-CRF} Models for Sequence Tagging].</ref> without the need of performing complex feature engineering <ref name="a" /><ref>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1868-5</ref><ref>Jason P. C. Chiu, Eric Nichols. 2015. [https://arxiv.org/abs/1511.08308 Named Entity Recognition with Bidirectional LSTM-CNNs].</ref>. LSTMNNs provide a model which has become a fundamental component for major companies like Google which prefer them over precedent approaches<ref>https://static.googleusercontent.com/media/research.google.com/it//pubs/archive/43905.pdf</ref>. It is worth nothing that a recent paper <ref>Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum. 2017. [https://arxiv.org/abs/1702.02098 Fast and Accurate Sequence Labeling with Iterated Dilated Convolutions].</ref> introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNNs to drastically improve computation time through full exploitation of the GPU's parallelizable architecture while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. | In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of architectures making use of Long Short-Term Nemory Neural Networks (LSTMNNs)<ref>http://www.aclweb.org/anthology/W/W03/W03-0426.pdf</ref>. LSTMNNs are often used in conjunction with CRFs and allow to obtain a performance equivalent to the aforementioned methods <ref name="a">Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer. 2016. [https://arxiv.org/abs/1603.01360 Neural Architectures for Named Entity Recognition].</ref><ref>Zhiheng Huang, Wei Xu, Kai Yu. 2015. [https://arxiv.org/abs/1508.01991 Bidirectional {LSTM-CRF} Models for Sequence Tagging].</ref> without the need of performing complex feature engineering <ref name="a" /><ref>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1868-5</ref><ref>Jason P. C. Chiu, Eric Nichols. 2015. [https://arxiv.org/abs/1511.08308 Named Entity Recognition with Bidirectional LSTM-CNNs].</ref>. LSTMNNs provide a model which has become a fundamental component for major companies like Google which prefer them over precedent approaches<ref>https://static.googleusercontent.com/media/research.google.com/it//pubs/archive/43905.pdf</ref>. It is worth nothing that a recent paper <ref>Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum. 2017. [https://arxiv.org/abs/1702.02098 Fast and Accurate Sequence Labeling with Iterated Dilated Convolutions].</ref> introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNNs to drastically improve computation time through full exploitation of the GPU's parallelizable architecture while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. | ||
As far as the current state of the NER reasearch, matters of great concern include training data scarcity and inter-domain generalization <ref name="id">https://arxiv.org/abs/1612.00148</ref>. In order to be efficient on a language domain, Current NER systems need large labeled datasets related to that domain <ref>https://arxiv.org/abs/1701.02877</ref>. This training data isn’t available for all language domains, which leads to the impossibility of applying NER efficiently to them. Furthermore, if a language domain doesn’t follow strict language conventions and allows for a wide use of the language, then the model will fail to generalize due to excessive heterogeneity in the data. Examples of such domains are Sport and Finance. | As far as the current state of the NER reasearch, matters of great concern include training data scarcity and inter-domain generalization <ref name="id">Vivek Kulkarni, Yashar Mehdad, Troy Chevalier.2016. [https://arxiv.org/abs/1612.00148 Domain Adaptation for Named Entity Recognition in Online Media with Word Embeddings].</ref>. In order to be efficient on a language domain, Current NER systems need large labeled datasets related to that domain <ref>https://arxiv.org/abs/1701.02877</ref>. This training data isn’t available for all language domains, which leads to the impossibility of applying NER efficiently to them. Furthermore, if a language domain doesn’t follow strict language conventions and allows for a wide use of the language, then the model will fail to generalize due to excessive heterogeneity in the data. Examples of such domains are Sport and Finance. | ||
This is the reason for which one of the big challenges is “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”<ref name="id"/>. | This is the reason for which one of the big challenges is “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”<ref name="id"/>. | ||
== Bibliography == | == Bibliography == | ||
<references /> | <references /> |
Revision as of 15:03, 3 November 2017
Discussion of the State of the Art
Named Entity Recognizers relied for a long time on algorithms based on statistical models such as the Hidden Markov Model (HMM) [1][2] and those part of the Conditional Random Fields (CRFs) [3][4] class. For instance, the widely used Stanford Named Entity Recognizer [5] uses CRFs. In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of architectures making use of Long Short-Term Nemory Neural Networks (LSTMNNs)[6]. LSTMNNs are often used in conjunction with CRFs and allow to obtain a performance equivalent to the aforementioned methods [7][8] without the need of performing complex feature engineering [7][9][10]. LSTMNNs provide a model which has become a fundamental component for major companies like Google which prefer them over precedent approaches[11]. It is worth nothing that a recent paper [12] introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNNs to drastically improve computation time through full exploitation of the GPU's parallelizable architecture while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. As far as the current state of the NER reasearch, matters of great concern include training data scarcity and inter-domain generalization [13]. In order to be efficient on a language domain, Current NER systems need large labeled datasets related to that domain [14]. This training data isn’t available for all language domains, which leads to the impossibility of applying NER efficiently to them. Furthermore, if a language domain doesn’t follow strict language conventions and allows for a wide use of the language, then the model will fail to generalize due to excessive heterogeneity in the data. Examples of such domains are Sport and Finance. This is the reason for which one of the big challenges is “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”[13].
Bibliography
- ↑ Dan Klein, Joseph Smarr, Huy Nguyen, and Christopher D. Manning. 2003. Named entity recognition with character-level models. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03), Vol. 4. Association for Computational Linguistics, Stroudsburg, PA, USA, 180-183.
- ↑ Dan Shen, Jie Zhang, Guodong Zhou, Jian Su, and Chew-Lim Tan. 2003. Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain. In Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13 (BioMed '03), Vol. 13. Association for Computational Linguistics, Stroudsburg, PA, USA, 49-56.
- ↑ Andrew McCallum and Wei Li. 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03), Vol. 4. Association for Computational Linguistics, Stroudsburg, PA, USA, 188-191.
- ↑ Burr Settles. 2004. Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA '04), Nigel Collier, Patrick Ruch, and Adeline Nazarenko (Eds.). Association for Computational Linguistics, Stroudsburg, PA, USA, 104-107.
- ↑ https://nlp.stanford.edu/software/CRF-NER.html
- ↑ http://www.aclweb.org/anthology/W/W03/W03-0426.pdf
- ↑ 7.0 7.1 Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer. 2016. Neural Architectures for Named Entity Recognition.
- ↑ Zhiheng Huang, Wei Xu, Kai Yu. 2015. Bidirectional {LSTM-CRF} Models for Sequence Tagging.
- ↑ https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1868-5
- ↑ Jason P. C. Chiu, Eric Nichols. 2015. Named Entity Recognition with Bidirectional LSTM-CNNs.
- ↑ https://static.googleusercontent.com/media/research.google.com/it//pubs/archive/43905.pdf
- ↑ Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum. 2017. Fast and Accurate Sequence Labeling with Iterated Dilated Convolutions.
- ↑ 13.0 13.1 Vivek Kulkarni, Yashar Mehdad, Troy Chevalier.2016. Domain Adaptation for Named Entity Recognition in Online Media with Word Embeddings.
- ↑ https://arxiv.org/abs/1701.02877