Named Entity Recognition: Difference between revisions
No edit summary |
No edit summary |
||
Line 5: | Line 5: | ||
In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of Long short-term memory <ref>https://en.wikipedia.org/wiki/Long_short-term_memory</ref> neural network (LSTMNN) architectures, which are often used in conjunction with CRF to obtain state-of-the-art-performance <ref>https://arxiv.org/abs/1603.01360</ref><ref>https://arxiv.org/abs/1508.01991</ref> and provide a model which has become a fundamental feature for major companies according to <ref>https://en.wikipedia.org/wiki/Long_short-term_memory#History</ref>. LSTMNN are therefore currently preferred to HMM. | In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of Long short-term memory <ref>https://en.wikipedia.org/wiki/Long_short-term_memory</ref> neural network (LSTMNN) architectures, which are often used in conjunction with CRF to obtain state-of-the-art-performance <ref>https://arxiv.org/abs/1603.01360</ref><ref>https://arxiv.org/abs/1508.01991</ref> and provide a model which has become a fundamental feature for major companies according to <ref>https://en.wikipedia.org/wiki/Long_short-term_memory#History</ref>. LSTMNN are therefore currently preferred to HMM. | ||
As a last note, a recent paper <ref>https://arxiv.org/abs/1702.02098</ref> introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNN to drastically improve computation time through parallelization while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. | As a last note, a recent paper <ref>https://arxiv.org/abs/1702.02098</ref> introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNN to drastically improve computation time through parallelization while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. | ||
Matters of great concern in NER as of now include training data scarcity and inter-domain generalization <ref name="id">https://arxiv.org/abs/1612. | Matters of great concern in NER as of now include training data scarcity and inter-domain generalization <ref name="id">https://arxiv.org/abs/1612.00148</ref>. In order to be efficient on a language domain, Current NER systems need large labeled datasets related to that domain <ref>https://arxiv.org/abs/1701.02877</ref>. This training data isn’t available for all language domains, which leads to the impossibility of applying NER efficiently to them. Furthermore, if a language domain doesn’t follow strict language conventions and allows for a wide use of the language, then the model will fail to generalize due to excessive heterogeneity. Examples of such domains are Sport and Finance. | ||
This is the reason for which one of the big challenges is, as stated in <ref name="id"/>, “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”. | This is the reason for which one of the big challenges is, as stated in <ref name="id"/>, “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”. | ||
== Bibliography == | == Bibliography == | ||
<references /> | <references /> |
Revision as of 10:36, 3 November 2017
Discussion of the State of the Art
According to [1], state-of-the-art implementations of named entity recognition (NER) heavily rely on algorithms based Hidden Markov Model (HMM) [2] and on Conditional Random Field (CRF) [3]. The Hidden Markov Model is a statistical model which describes the system as being in one of a number of possible state. To each state are associated possible outputs with their respective probabilities; furthermore, the system will change state with a certain probability. The Conditional Random Field is another statistical model whose distinctive feature is the context-aware nature: where a discrete classifier works on mapping a single sample to a single class, CSF outputs sequence of labels for sequence of samples. For instance, the widely used Stanford Named Entity Recognizer [4] uses CRF. In recent years however the advancement in both GPU technology and deep learning techniques triggered the advent of Long short-term memory [5] neural network (LSTMNN) architectures, which are often used in conjunction with CRF to obtain state-of-the-art-performance [6][7] and provide a model which has become a fundamental feature for major companies according to [8]. LSTMNN are therefore currently preferred to HMM. As a last note, a recent paper [9] introduces the possibility to use Iterated Dilated Convolutional Neural Networks (ID-CNNs) in place of LSTMNN to drastically improve computation time through parallelization while keeping the same level of accuracy, which suggests ID-CNNs could be the next step in improving NER. Matters of great concern in NER as of now include training data scarcity and inter-domain generalization [10]. In order to be efficient on a language domain, Current NER systems need large labeled datasets related to that domain [11]. This training data isn’t available for all language domains, which leads to the impossibility of applying NER efficiently to them. Furthermore, if a language domain doesn’t follow strict language conventions and allows for a wide use of the language, then the model will fail to generalize due to excessive heterogeneity. Examples of such domains are Sport and Finance. This is the reason for which one of the big challenges is, as stated in [10], “adapt[ing] models learned on domains where large amounts of annotated training data are available to domains with scarce annotated data”.
Bibliography
- ↑ https://en.wikipedia.org/wiki/Named-entity_recognition
- ↑ https://en.wikipedia.org/wiki/Hidden_Markov_model
- ↑ https://en.wikipedia.org/wiki/Conditional_random_field
- ↑ https://nlp.stanford.edu/software/CRF-NER.html
- ↑ https://en.wikipedia.org/wiki/Long_short-term_memory
- ↑ https://arxiv.org/abs/1603.01360
- ↑ https://arxiv.org/abs/1508.01991
- ↑ https://en.wikipedia.org/wiki/Long_short-term_memory#History
- ↑ https://arxiv.org/abs/1702.02098
- ↑ 10.0 10.1 https://arxiv.org/abs/1612.00148
- ↑ https://arxiv.org/abs/1701.02877