Term Frequency · Inverse Document Frequency

Rank by relevance

Enter a query term to compute TF-IDF scores across all scanned documents and rank them by relevance.

How TF-IDF Works

TF — Term Frequency

TF(t, d) = count(t in d) ÷ |d|

How often the term appears in a single document, normalised by document length. A term appearing 5 times in a 100-word document has TF = 0.05.

IDF — Inverse Document Frequency

IDF(t, D) = log₂( N ÷ df(t) )

Penalises terms that appear in many documents (common words score low). Uses log base 2. A term in every document has IDF = 0. A term in 1 of 10 documents has IDF = log₂(10) ≈ 3.32.

TF-IDF — The Score

TF-IDF(t, d, D) = TF × IDF

Combines both signals. A high TF-IDF means the term is frequent in this document and rare across the corpus — a strong relevance indicator. Documents are then ranked in decreasing order of TF-IDF.