GitHub - nausheenfatma/Spell-Check-Using-Bigram-Language-Modelling: This project is to provide spell check help from Urdu to Hindi transliteration.The spelling errors in our case mostly comprises of errors in matras.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
BigramModelSpellCheck.py		BigramModelSpellCheck.py
NONBREAKING_PREFIXES		NONBREAKING_PREFIXES
README.txt		README.txt
SpellingCorrectionLanguageModels.pdf		SpellingCorrectionLanguageModels.pdf
corpus.txt		corpus.txt
indic_tokenizer.py		indic_tokenizer.py

Repository files navigation

1)Tokenise Hindi corpus.
python indic_tokenizer.py --i  corpus.txt --o corpus_tokenised.txt --l hin


indic_tokenizer is a sentence tokenizer for Indian Languages. "hin" is the code for the language "Hindi"

2)Run the spell checker (Give an input sentence and the index of the word in consideration) 
python BigramModelSpellCheck.py

The code returns a ranked list of word suggestions (with most probable word on the top)