ABOUT ME

AIFFEL대전에서 인공지능을 시작했다. 새로운 시작이다. 녹녹치는 않다. 하지만 해볼만 하다.

Today
Yesterday
Total
  • [김성범:DMQA]데이터 증식 기법
    NLP 2022. 1. 21. 11:33

    1. Lexical Substitution

      (1) Thesaurus-based substitution

      (2) Word-Embeddings Substitution

      (3) Masked Language Model

      (4) TF-IDF based word replacement

    2. Back Translation

    3. Text Surface Transformation

    4. Random Noise Injection

      (1) Spelling error injection

      (2) QWERTY Keyboard Error Injection

      (3) Unigram Noising

      (4) Blank Noising

      (5) Sentence Shuffling

      (6) Random Insertion

      (7) Random Swap

      (8) Random Deletion

    5. Instance Crossover Augmentation

    6. Syntax-tree Manipuation

    7. MixUp for Text

      (1) wordMixup

      (2) sentMixup

    8. Generative Methods

      (1) Conditional Pre-trained Language Models

    9. Imprementation

      (1) nlpaug

      (2) textattack

     

    [참고] https://amitness.com/2020/05/data-augmentation-for-nlp/

    ===================================

     

     

    1. Lexical Substitution

      (1) Thesaurus-based substitution

    [논문] Character-level convolutional networks for text classification. Advances in seural information processing systems(Zhang X, Zhao J, & LeCun, 2015)

     

      (2) Word-Embeddings Substitution

    [논문] Tinybert : Distilling bert for natural language understanding(Jiao X, Yin Y, Shang L 2019)

     

      (3) Masked Language Model

    [논문] BAE:BERT-based Adversarial Examples for Text Classification. (Garg S, Ramakrishnan G, 2020)

     

      (4) TF-IDF based word replacement

    [논문] Unsupervised data augmentation for consistency training(Xia Q, Dai Z, 2019)

     

     

    2. Back Translation

    [논문] Unsupervised data augmentation for consistency training(Xia Q, Dai Z, 2019)

     

     

     

    3. Text Surface Transformation

     

     

     

    4. Random Noise Injection

      (1) Spelling error injection

     

     

      (2) QWERTY Keyboard Error Injection

     

     

      (3) Unigram Noising

     

     

      (4) Blank Noising

     

     

      (5) Sentence Shuffling

     

     

      (6) Random Insertion

     

     

      (7) Random Swap

     

     

      (8) Random Deletion

     

     

     

    5. Instance Crossover Augmentation

     

     

     

    6. Syntax-tree Manipuation

     

     

     

    7. MixUp for Text

      (1) wordMixup

    [논문] Augmenting data with mixup for sentence classification : An emprical Study(Guo M, Mao Y, 2019)

     

      (2) sentMixup

     

     

     

     

    8. Generative Methods

      (1) Conditional Pre-trained Language Models

     

     

    9. Imprementation

      (1) nlpaug

     

      (2) textattack

     

     

     

     

     

     

     

    'NLP' 카테고리의 다른 글

    JSON  (0) 2022.02.06
    [아이펠특강]데이터 증강-유재영  (0) 2022.02.04
    [김성범]Data Augmentation  (0) 2022.01.21
    [Attention] 6. Self-Attention  (0) 2022.01.18
    NLP14 : BERT pretrained model 제작  (0) 2022.01.06
Designed by Tistory.