How can we effectively regularize BERT. Although BERT proves its effectiveness in various NLP tasks. it often overfits when there are only a small number of training instances. A promising direction to regularize BERT is based on pruning its attention heads with a proxy score for head importance. https://www.rawafricaonline.com/