Few-NERD

Not only a Few-shot NER dataset

About Few-NERD

Few-NERD is a large-scale, fine-grained manually annotated named entity recognition dataset, which contains 8 coarse-grained types, 66 fine-grained types, 188,200 sentences, 491,711 entities and 4,601,223 tokens. Three benchmark tasks are built, one is supervised (Few-NERD (SUP)) and the other two are few-shot (Few-NERD (INTRA) and Few-NERD (INTER)). Few-NERD is collected by researchers from Tsinghua University and DAMO Academy, Alibaba Group .

For more details about Few-NERD, please refer to our ACL-IJCNLP 2021 paper:

Getting started


Raw Datasets

Few-NERD is distributed under CC BY-SA 4.0 license , download Few-NERD raw datasets by following links:

Few-NERD (SUP) (14 MB) Few-NERD (INTRA) (12 MB) Few-NERD (INTER) (12 MB)

Sampled Datasets

As the sampling strategy has considerable impact in few-shot learning, thus we also release a data sampled by us (using the util/fewshotsampler.py in our code). The files of these sampled data are named such train/dev/test_N_K.json. We sampled 20000, 1000, 5000 episodes for train, dev, test, respectively. The results of the paper and the leaderboard are produced by this data. Sampled Few-NERD (568 MB)

Check out the Github repository for a comprehensive guide to use Few-NERD.

About the Leaderboard

To facilitate diversified research about named entities, we release all the data (including the test set) of the three tasks. We encourage the community to do research beyond these settings (such as open/ unsupervised/ continual NER or entity typing/ linking, etc). Enjoy!
But we still maintain a leaderboard to record the peer-reviewd results.

Connection

If you have any questions about Few-NERD, or you want to update the leaderboard, feel free to email to the authors:
If you use Few-NERD in your work, please cite the paper:

@inproceedings{ding2021few,
  title={Few-NERD:A Few-shot Named Entity Recognition Dataset},
  author={Ding, Ning and Xu, Guangwei and Chen, Yulin, and Wang, Xiaobin and Han, Xu and Xie, Pengjun and Zheng, Hai-Tao and Liu, Zhiyuan},
  booktitle={ACL-IJCNLP},
  year={2021}
}
The supervised setting is a standard NER task.
Model   Code      Precision         Recall         F1-Measure    
1
Feb 27, 2021
BERT-Tagger
Few-NERD paper
65.56 68.78 67.13
The few-shot (intra) setting is a few-shot NER task across different coarse-grained types.
Model Code 5 way 1~2 shot 5 way 5~10 shot 10 way 1~2 shot 10 way 5~10 shot Avg
1
Feb 27, 2021
StructShot
Few-NERD paper
30.21 38.00 21.03 26.42 28.92
1
Feb 27, 2021
ProtoBERT
Few-NERD paper
20.76 42.54 15.05 35.40 28.43
1
Feb 27, 2021
NNShot
Few-NERD paper
25.78 36.18 18.27 27.38 26.90
The few-shot (inter) setting is a few-shot NER task within coarse-grained types.
Model Code 5 way 1~2 shot 5 way 5~10 shot 10 way 1~2 shot 10 way 5~10 shot Avg
1
Feb 27, 2021
StructShot
Few-NERD paper
51.88 57.32 43.34 49.57 50.53
1
Feb 27, 2021
NNShot
Few-NERD paper
47.24 55.64 38.87 49.57 47.83
1
Feb 27, 2021
ProtoBERT
Few-NERD paper
38.83 58.79 32.34 52.92 45.72