Skip to content

Latest commit

 

History

History
18 lines (10 loc) · 1.92 KB

File metadata and controls

18 lines (10 loc) · 1.92 KB

Empirical Laws of Natural Language Processing for Neural Language Generated Text

Many sequence models show great work in generating human like text, but the amount of research work done to check the extent up to which their results match the man-made texts are limited in number.

In this work, the text is generated using Long Short Term Memory networks (LSTMs) and Generative Pretrained Transformer-2 (GPT-2). The text by neural language models based on LSTMs and GPT-2 follows Zipf’s law and Heap’s law, two statistical representations followed by every natural language generated text. One of the main findings is about the influence of parameter Temperature on the text produced. The LSTM generated text improves as the value of Temperature increases. The comparison between GPT-2 and LSTM generated text also shows that text generated using GPT-2 is more similar to natural text than that generated by LSTMs.

Sherlock.txt: Dataset

LSTM_Text_Generator_colab.ipynb: Explored and cleaned dataset, tuned hyper-parammeters and generated LSTM model

LSTM_Text_Exploration.ipynb: Generated text using model generated in previous N.B Verified Zipf's and Heap's law on data generated

If you use this work, Please cite it as:

Sumedha, Rohilla, R. (2021). Empirical Laws of Natural Language Processing for Neural Language Generated Text. In: Bhattacharya, M., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2021. Communications in Computer and Information Science, vol 1417. Springer, Cham. https://doi.org/10.1007/978-3-030-88378-2_15