Empirical Laws of Natural Language Processing for Neural Language Generated Text

Many sequence models show great work in generating human like text, but the amount of research work done to check the extent up to which their results match the man-made texts are limited in number.

In this work, the text is generated using Long Short Term Memory networks (LSTMs) and Generative Pretrained Transformer-2 (GPT-2). The text by neural language models based on LSTMs and GPT-2 follows Zipf’s law and Heap’s law, two statistical representations followed by every natural language generated text. One of the main findings is about the influence of parameter Temperature on the text produced. The LSTM generated text improves as the value of Temperature increases. The comparison between GPT-2 and LSTM generated text also shows that text generated using GPT-2 is more similar to natural text than that generated by LSTMs.

Sherlock.txt: Dataset

LSTM_Text_Generator_colab.ipynb: Explored and cleaned dataset, tuned hyper-parammeters and generated LSTM model

LSTM_Text_Exploration.ipynb: Generated text using model generated in previous N.B Verified Zipf's and Heap's law on data generated

Transformers_Text_generation.ipynb: Generated text using transformers (Source Referenced:: https://www.youtube.com/watch?v=vSN5Tn38ZIc&list=PL0Gv4uEEcwwDW6Q5AxxlektvPjeK9g_cJ&index=1. https://github.com/raghavbali/text_generation/blob/master/notebooks/text_generation_03.ipynb)

Paper Published: https://link.springer.com/chapter/10.1007/978-3-030-88378-2_15

If you use this work, Please cite it as:

Sumedha, Rohilla, R. (2021). Empirical Laws of Natural Language Processing for Neural Language Generated Text. In: Bhattacharya, M., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2021. Communications in Computer and Information Science, vol 1417. Springer, Cham. https://doi.org/10.1007/978-3-030-88378-2_15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empirical Laws of Natural Language Processing for Neural Language Generated Text

Sherlock.txt: Dataset

LSTM_Text_Generator_colab.ipynb: Explored and cleaned dataset, tuned hyper-parammeters and generated LSTM model

LSTM_Text_Exploration.ipynb: Generated text using model generated in previous N.B Verified Zipf's and Heap's law on data generated

Transformers_Text_generation.ipynb: Generated text using transformers (Source Referenced:: https://www.youtube.com/watch?v=vSN5Tn38ZIc&list=PL0Gv4uEEcwwDW6Q5AxxlektvPjeK9g_cJ&index=1. https://github.com/raghavbali/text_generation/blob/master/notebooks/text_generation_03.ipynb)

Paper Published: https://link.springer.com/chapter/10.1007/978-3-030-88378-2_15

If you use this work, Please cite it as:

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Empirical Laws of Natural Language Processing for Neural Language Generated Text

Sherlock.txt: Dataset

LSTM_Text_Generator_colab.ipynb: Explored and cleaned dataset, tuned hyper-parammeters and generated LSTM model

LSTM_Text_Exploration.ipynb: Generated text using model generated in previous N.B Verified Zipf's and Heap's law on data generated

Transformers_Text_generation.ipynb: Generated text using transformers (Source Referenced:: https://www.youtube.com/watch?v=vSN5Tn38ZIc&list=PL0Gv4uEEcwwDW6Q5AxxlektvPjeK9g_cJ&index=1. https://github.com/raghavbali/text_generation/blob/master/notebooks/text_generation_03.ipynb)

Paper Published: https://link.springer.com/chapter/10.1007/978-3-030-88378-2_15

If you use this work, Please cite it as: