Named Entity Recognition: Difference between revisions
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
For instance, the widely used Stanford Named Entity Recognizer <ref>https://nlp.stanford.edu/software/CRF-NER.html</ref> uses CRFs. | For instance, the widely used Stanford Named Entity Recognizer <ref>https://nlp.stanford.edu/software/CRF-NER.html</ref> uses CRFs. | ||
In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of architectures making use of Long Short-Term Nemory Neural Networks (LSTMNNs)<ref>http://www.aclweb.org/anthology/W/W03/W03-0426.pdf</ref>. LSTMNN are often used in conjunction with CRF and allow to obtain a performance equivalent to the aforementioned methods <ref name="a">https://arxiv.org/abs/1603.01360</ref><ref>https://arxiv.org/abs/1508.01991</ref> without the need of performing complex feature engineering <ref name="a" /><ref>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1868-5</ref><ref>https://arxiv.org/pdf/1511.08308v5.pdf</ref>. LSTMNN provide a model which has become a fundamental component for major companies which prefer them over precedent approaches.<ref>https://static.googleusercontent.com/media/research.google.com/it//pubs/archive/43905.pdf</ref>. | In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of architectures making use of Long Short-Term Nemory Neural Networks (LSTMNNs)<ref>http://www.aclweb.org/anthology/W/W03/W03-0426.pdf</ref>. LSTMNN are often used in conjunction with CRF and allow to obtain a performance equivalent to the aforementioned methods <ref name="a">https://arxiv.org/abs/1603.01360</ref><ref>https://arxiv.org/abs/1508.01991</ref> without the need of performing complex feature engineering <ref name="a" /><ref>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1868-5</ref><ref>https://arxiv.org/pdf/1511.08308v5.pdf</ref>. LSTMNN provide a model which has become a fundamental component for major companies which prefer them over precedent approaches.<ref>https://static.googleusercontent.com/media/research.google.com/it//pubs/archive/43905.pdf</ref>. | ||
It is worth nothing that a recent paper <ref>https://arxiv.org/abs/1702.02098</ref> introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNN to drastically improve computation time through parallelization while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. | It is worth nothing that a recent paper <ref>https://arxiv.org/abs/1702.02098</ref> introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNN to drastically improve computation time through full exploitation of the GPU's parallelization while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. | ||
As far as the current state of the NER reasearch, matters of great concern include training data scarcity and inter-domain generalization <ref name="id">https://arxiv.org/abs/1612.00148</ref>. In order to be efficient on a language domain, Current NER systems need large labeled datasets related to that domain <ref>https://arxiv.org/abs/1701.02877</ref>. This training data isn’t available for all language domains, which leads to the impossibility of applying NER efficiently to them. Furthermore, if a language domain doesn’t follow strict language conventions and allows for a wide use of the language, then the model will fail to generalize due to excessive heterogeneity in the data. Examples of such domains are Sport and Finance. | |||
This is the reason for which one of the big challenges is, as stated in <ref name="id"/>, “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”. | This is the reason for which one of the big challenges is, as stated in <ref name="id"/>, “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”. | ||
== Bibliography == | == Bibliography == | ||
<references /> | <references /> |
Revision as of 13:21, 3 November 2017
Discussion of the State of the Art
Implementations of Named Entity Recognizer relied for a long time on algorithms based on statistical models such as the Hidden Markov Model (HMM) [1][2] and those part of the Conditional Random Fields (CRFs) [3][4] class. For instance, the widely used Stanford Named Entity Recognizer [5] uses CRFs. In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of architectures making use of Long Short-Term Nemory Neural Networks (LSTMNNs)[6]. LSTMNN are often used in conjunction with CRF and allow to obtain a performance equivalent to the aforementioned methods [7][8] without the need of performing complex feature engineering [7][9][10]. LSTMNN provide a model which has become a fundamental component for major companies which prefer them over precedent approaches.[11]. It is worth nothing that a recent paper [12] introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNN to drastically improve computation time through full exploitation of the GPU's parallelization while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. As far as the current state of the NER reasearch, matters of great concern include training data scarcity and inter-domain generalization [13]. In order to be efficient on a language domain, Current NER systems need large labeled datasets related to that domain [14]. This training data isn’t available for all language domains, which leads to the impossibility of applying NER efficiently to them. Furthermore, if a language domain doesn’t follow strict language conventions and allows for a wide use of the language, then the model will fail to generalize due to excessive heterogeneity in the data. Examples of such domains are Sport and Finance. This is the reason for which one of the big challenges is, as stated in [13], “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”.
Bibliography
- ↑ https://dl.acm.org/citation.cfm?id=1119204
- ↑ https://dl.acm.org/citation.cfm?id=1118965
- ↑ https://dl.acm.org/citation.cfm?id=1119206
- ↑ https://dl.acm.org/citation.cfm?id=1567618
- ↑ https://nlp.stanford.edu/software/CRF-NER.html
- ↑ http://www.aclweb.org/anthology/W/W03/W03-0426.pdf
- ↑ 7.0 7.1 https://arxiv.org/abs/1603.01360
- ↑ https://arxiv.org/abs/1508.01991
- ↑ https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1868-5
- ↑ https://arxiv.org/pdf/1511.08308v5.pdf
- ↑ https://static.googleusercontent.com/media/research.google.com/it//pubs/archive/43905.pdf
- ↑ https://arxiv.org/abs/1702.02098
- ↑ 13.0 13.1 https://arxiv.org/abs/1612.00148
- ↑ https://arxiv.org/abs/1701.02877