Named entity recognition
Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.
Example:
Mark | Watney | visited | Mars |
---|---|---|---|
B-PER | I-PER | O | B-LOC |
ArmanPersoNERCorpus
The ArmanPersoNERCorpus dataset contains 7,682 sentences with 250,015 tokens tagged in IOB format in six different classes, Organization, Person, Location, Facility, Event, and Product.
Download Links: ARMAN
Model | F1 | Paper / Source | Code |
---|---|---|---|
ParsBERT (Farahani et al., 2020) | 99.84 | ParsBERT: Transformer-based Model for Persian Language Understanding | Official |
LSTM-CRF (Hafezi, Rezaeian, 2018) | 86.55 | Neural Architecture for Persian Named Entity Recognition | - |
mBERT (Taher et al., 2020) | 84.03 | Beheshti-NER: Persian Named Entity Recognition Using BERT | Official |
Deep-CRF (Bokaei, Mahmoudi, 2018) | 81.50 | Improved Deep Persian Named Entity Recognition | - |
Deep-Local (Bokaei, Mahmoudi, 2018) | 79.19 | Improved Deep Persian Named Entity Recognition | - |
BiLSTM-CRF (Poostchi et al., 2018) | 77.45 | BiLSTM-CRF for Persian Named-Entity Recognition | - |
SVM-HMM (Poostchi et al., 2016) | 72.59 | PersoNER: Persian Named-Entity Recognition | - |
PEYMA
The PEYMA dataset includes 7,145 sentences with 302,530 tokens from which 41,148 tokens are tagged in IOB format in with seven different classes, Organization, Percent, Money, Location, Date, Time, and Person.
Download Links: PEYMA
Model | F1 | Paper / Source | Code |
---|---|---|---|
ParsBERT (Farahani et al., 2020) | 93.40 | ParsBERT: Transformer-based Model for Persian Language Understanding | Official |
mBERT (Taher et al., 2020) | 90.59 | Beheshti-NER: Persian Named Entity Recognition Using BERT | Official |
Rule-Based-CRF (Shahshahani et al., 2018) | 84.00 | PEYMA: A Tagged Corpus for Persian Named Entities | - |