TypoDetector
Sinhala spell checker using a character-level n-gram language model combined with edit-distance candidate generation.
Import
Section titled “Import”from sinlib import TypoDetectorQuick Reference
Section titled “Quick Reference”| Method / Property | Returns | Description |
|---|---|---|
TypoDetector.from_pretrained(repo) | TypoDetector | Load from HF Hub |
detector(text) | str | Correct a sentence |
detector.suggest_correction(word) | list[str] | Closest dictionary matches |
detector.word_ngram_probability(word) | float | N-gram likelihood score |
detector.get_dictionary() | set[str] | Full word list |
detector.get_ngram_probs() | dict | Full n-gram table |
detector.dictionary | str | Human-readable summary |
detector.ngram_probs | str | Human-readable summary |
Correct a sentence
Section titled “Correct a sentence”from sinlib import TypoDetector
detector = TypoDetector.from_pretrained("Ransaka/sinlib")
detector("අපකරියට ගිය")# 'අපකීර්තියට ගිය'Get correction suggestions
Section titled “Get correction suggestions”detector.suggest_correction("අඩිරාජ")# ['අධිරාජ']
detector.suggest_correction("xyz")# ['No suggestion']Score a word
Section titled “Score a word”prob = detector.word_ngram_probability("සිංහල")# 0.000032 (higher = more likely to be a real word)Inspect the dictionary
Section titled “Inspect the dictionary”print(detector.dictionary)# Dictionary containing 45231 words.
words = detector.get_dictionary()"ගෙදර" in words # TrueBehaviour Details
Section titled “Behaviour Details”For each word in the input sentence the detector:
- Checks if the word is in the known dictionary — if yes, passes through unchanged.
- Estimates the word’s character-level bigram probability.
- If
prob < threshold(default1e-8): replaces with the topsuggest_correctionresult. - If
threshold <= prob < 1.0: emits aUserWarningbut keeps the word.
- If
- On any processing error, emits a
UserWarningand keeps the original word.