[김성범:DMQA]데이터 증식 기법

NLP 2022. 1. 21. 11:33

1. Lexical Substitution

(1) Thesaurus-based substitution

(2) Word-Embeddings Substitution

(3) Masked Language Model

(4) TF-IDF based word replacement

2. Back Translation

3. Text Surface Transformation

4. Random Noise Injection

(1) Spelling error injection

(2) QWERTY Keyboard Error Injection

(3) Unigram Noising

(4) Blank Noising

(5) Sentence Shuffling

(6) Random Insertion

(7) Random Swap

(8) Random Deletion

5. Instance Crossover Augmentation

6. Syntax-tree Manipuation

7. MixUp for Text

(1) wordMixup

(2) sentMixup

8. Generative Methods

(1) Conditional Pre-trained Language Models

9. Imprementation

(1) nlpaug

(2) textattack

[참고] https://amitness.com/2020/05/data-augmentation-for-nlp/

===================================

1. Lexical Substitution

(1) Thesaurus-based substitution

[논문] Character-level convolutional networks for text classification. Advances in seural information processing systems(Zhang X, Zhao J, & LeCun, 2015)

(2) Word-Embeddings Substitution

[논문] Tinybert : Distilling bert for natural language understanding(Jiao X, Yin Y, Shang L 2019)

(3) Masked Language Model

[논문] BAE:BERT-based Adversarial Examples for Text Classification. (Garg S, Ramakrishnan G, 2020)

(4) TF-IDF based word replacement

[논문] Unsupervised data augmentation for consistency training(Xia Q, Dai Z, 2019)

2. Back Translation

[논문] Unsupervised data augmentation for consistency training(Xia Q, Dai Z, 2019)

3. Text Surface Transformation

4. Random Noise Injection

(1) Spelling error injection

(2) QWERTY Keyboard Error Injection

(3) Unigram Noising

(4) Blank Noising

(5) Sentence Shuffling

(6) Random Insertion

(7) Random Swap

(8) Random Deletion

5. Instance Crossover Augmentation

6. Syntax-tree Manipuation

7. MixUp for Text

(1) wordMixup

[논문] Augmenting data with mixup for sentence classification : An emprical Study(Guo M, Mao Y, 2019)

(2) sentMixup

8. Generative Methods

(1) Conditional Pre-trained Language Models

9. Imprementation

(1) nlpaug

(2) textattack

'NLP' 카테고리의 다른 글

JSON (0)	2022.02.06
[아이펠특강]데이터 증강-유재영 (0)	2022.02.04
[김성범]Data Augmentation (0)	2022.01.21
[Attention] 6. Self-Attention (0)	2022.01.18
NLP14 : BERT pretrained model 제작 (0)	2022.01.06

ABOUT ME

딥러닝뽀개기 딥러닝뽀개기

1. Lexical Substitution

(1) Thesaurus-based substitution

(2) Word-Embeddings Substitution

(3) Masked Language Model

(4) TF-IDF based word replacement

2. Back Translation

3. Text Surface Transformation

4. Random Noise Injection

(1) Spelling error injection

(2) QWERTY Keyboard Error Injection

(3) Unigram Noising

(4) Blank Noising

(5) Sentence Shuffling

(6) Random Insertion

(7) Random Swap

(8) Random Deletion

5. Instance Crossover Augmentation

6. Syntax-tree Manipuation

7. MixUp for Text

(1) wordMixup

(2) sentMixup

8. Generative Methods

(1) Conditional Pre-trained Language Models

9. Imprementation

(1) nlpaug

(2) textattack

'NLP' 카테고리의 다른 글

티스토리툴바

ABOUT ME

1. Lexical Substitution

(1) Thesaurus-based substitution

(2) Word-Embeddings Substitution

(3) Masked Language Model

(4) TF-IDF based word replacement

2. Back Translation

3. Text Surface Transformation

4. Random Noise Injection

(1) Spelling error injection

(2) QWERTY Keyboard Error Injection

(3) Unigram Noising

(4) Blank Noising

(5) Sentence Shuffling

(6) Random Insertion

(7) Random Swap

(8) Random Deletion

5. Instance Crossover Augmentation

6. Syntax-tree Manipuation

7. MixUp for Text

(1) wordMixup

(2) sentMixup

8. Generative Methods

(1) Conditional Pre-trained Language Models

9. Imprementation

(1) nlpaug

(2) textattack

'NLP' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바