Skip to content

AnsImran/Transformer_from_Scratch_for_Text_Summarization

Repository files navigation

Note:

This version of the project works well if you expect around a thousand viewers per week. But it is not designed to scale beyond that.

If you want a highly scalable version, contact me at ansimran@protonmail.com.

The main goal here is to showcase Machine Learning skills, not full-stack AI development skills.

For an example of my work in full-stack AI development with scalability in mind, you can check this project: 👉 Production-ready self-corrective RAG.


full_tr

tags: PyTorch, numpy, pandas, Data Processing, Tokenization, Padding, Generator, Positional-Encoding, Padding-Mask, Look-Ahead-Mask, Encoder, Decoder, MultiHead-Self-Attention, Residual-Connectioncs, Batch Normalization, Feed-Forward-Neural-Networks, Embedding-Layer, Dropout-Layer, Masked-MultiHead-Self-Attention-(Causal-Attention), Linear-Layer, Log-Softmax, Training-Loop, Epochs, Learning Rate, Batch-Size, Pad-Index, Loss-Function, Optimizer, Predictions, Gradients & Updating Weights.


PyTorch

Building a Transformer from Scratch for Text Summarization

Based on the Natural Language Processing Specialization by DeepLearning.ai

Course 4 – Week 2

📘 Full NLP Specialization GitHub Repo Here: Natural Language Processing from Scratch


Results

1

2


1. Data Processing

  • Loading
  • Preprocessing
  • Tokenization
  • Padding
  • Generator

2. Useful Functions

  • Positional Encoding
  • Padding Mask
  • Look Ahead Mask

3. Transformer (Encoder-Decoder)

  • Encoder Layer
    • MultiHead Self-Attention
    • Residual Connection & Batch Normalization
    • Feed Forward Neural Network
    • Residual Connection & Batch Normalization
  • Full Encoder
    • Embedding Layer
    • Positional Encoding
    • Dropout Layer
    • Encoder LayerS
  • Decoder Layer
    • Masked MultiHead Self-Attention (Causal Attention)
    • Residual Connection & Batch Normalization
    • MultiHead Attention
    • Residual Connection & Batch Normalization
    • Feed Forward Neural Network
    • Residual Connection & Batch Normalization
  • Full Decoder
    • Embedding Layer
    • Positional Encoding
    • Dropout Layer
    • Decoder layerS
  • Full TRANSFORMER
    • Encoder + Decoder + Linear Layer
    • Log Softmax

4. Training Loop

  • Epochs, Learning Rate, Batch Size and Pad-Index
  • Loss Function
  • Optimizer
  • Computing Loss
  • Predictions
  • Clearing Gradients
  • Updating Weights

5. Inference

  • Next Word Prediction Function
  • Summarization Function

6. Conclusion

  • Some Remarks on Results

About

Natural Language Processing Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors