Enter a query term to compute TF-IDF scores across all scanned documents and rank them by relevance.
How TF-IDF Works
TF — Term Frequency
TF(t, d) = count(t in d) ÷ |d|
How often the term appears in a single document, normalised by document length. A term appearing 5 times in a 100-word document has TF = 0.05.
IDF — Inverse Document Frequency
IDF(t, D) = log₂( N ÷ df(t) )
Penalises terms that appear in many documents (common words score low). Uses log base 2. A term in every document has IDF = 0. A term in 1 of 10 documents has IDF = log₂(10) ≈ 3.32.
TF-IDF — The Score
TF-IDF(t, d, D) = TF × IDF
Combines both signals. A high TF-IDF means the term is frequent in this document and rare across the corpus — a strong relevance indicator. Documents are then ranked in decreasing order of TF-IDF.