Postings [1, 9] → gaps [1, 8] (9−1=8)
VBC is lossless — you can decode it back to the exact original gap values and reconstruct the posting list perfectly.
Lossless: Original data can be perfectly reconstructed (VBC, Gamma, Gap encoding)
Lossy: Some information is permanently lost (case folding, stemming, stop word removal)
Write the gap sequence for "expensive" in gamma code.
expensive → [1, 3]
Postings [1, 3] → gaps [1, 2] (3−1=2)
Create the Inverted Index. No normalization required.
| Term | df | Posting List |
|---|---|---|
| buy | 3 | 1 → 4 → 9 |
| expensive | 2 | 1 → 9 |
| phone | 1 | 1 |
| computer | 1 | 4 |
That's it. No Boolean queries, no term-doc matrix. ~1 minute.
Create a full inverted index.
| Term | df | Posting List |
|---|---|---|
| buy | 3 | 1 → 3 → 13 |
| expensive | 2 | 1 → 3 |
| phone | 1 | 1 |
Same format, different doc IDs. On the exam this feeds directly into compression.
Calculate the Damerau-Levenshtein distance of "bca" and "cba". Show the matrix and explain each step.
| c | b | a | ||
|---|---|---|---|---|
| 0 | 1 | 2 | 3 | |
| b | 1 | 1 | 1 | 2 |
| c | 2 | 1 | 1★ | 2 |
| a | 3 | 2 | 2 | 1 |
★ Transposition fires at d[2][2]: source "bc" matches target "cb" (adjacent swap) → d[0][0]+1 = 1. Final cell inherits the 1 since s[3]=t[3]='a' (match).
Calculate the Damerau-Levenshtein distance between "abc" and "acb". Fill in a matrix. Explain.
| a | c | b | ||
|---|---|---|---|---|
| 0 | 1 | 2 | 3 | |
| a | 1 | 0 | 1 | 2 |
| b | 2 | 1 | 1 | 1 |
| c | 3 | 2 | 1 | 1★ |
★ Transposition fires at d[3][3]: source "bc" matches target "cb" → d[1][1]+1 = 0+1 = 1.
Exam tip: Both exam versions are distance=1 with one transposition. The exam tests whether you can fill the matrix and narrate the transposition step.
Append $: t*o$ → Rotate so * is at end: o$t* → Search for prefix o$t
Matches: o$t (from "to"), o$tw (from "two"), o$to (from "too")
Result: to, two, too
Append $: *he$ → Rotate so * is at end: he$* → Search for prefix he$
Matches: he$ (from "he"), he$s (from "she")
Result: he, she
Draw the documents into the vector space. No weighting or log-scaling required.
This question always leads into cosine similarity or HAC clustering in the next subtask.
Plot the documents and query into the vector space.
A sits on the x-axis, B on the y-axis, Q is in the upper-right quadrant. This setup feeds directly into cosine similarity.
Calculate cosine similarity between A=(2,0) and Q=(2,4), and between B=(0,4) and Q=(2,4). Which is more similar?
cos(A, Q) = (2×2 + 0×4) / (√4 × √20) = 4 / (2 × 4.47) = 4/8.94 = 0.447
cos(B, Q) = (0×2 + 4×4) / (√16 × √20) = 16 / (4 × 4.47) = 16/17.89 = 0.894
B is more similar (despite Q containing both "yes" and "no", Q's "no" count dominates)
Calculate cosine similarity between each document and Q. Which is more similar?
cos(A, Q) = (2×2 + 0×1) / (√4 × √5) = 4 / (2 × 2.236) = 4/4.472 = 0.894
cos(B, Q) = (0×2 + 1×1) / (√1 × √5) = 1 / 2.236 = 0.447
A is more similar to Q. ~1 minute with calculator. No TF-IDF, no log scaling — just raw counts.
Draw the corresponding Precision-Recall graph (without interpolation). Give the P-R coordinates.
| Rank | Doc | Relevant? | P@k | R@k |
|---|---|---|---|---|
| 1 | 1 | ✓ | 1/1 = 1.000 | 1/5 = 0.200 |
| 2 | 3 | ✗ | 1/2 = 0.500 | 1/5 = 0.200 |
| 3 | 5 | ✗ | 1/3 = 0.333 | 1/5 = 0.200 |
| 4 | 9 | ✓ | 2/4 = 0.500 | 2/5 = 0.400 |
| 5 | 2 | ✓ | 3/5 = 0.600 | 3/5 = 0.600 |
(0.2, 1.0), (0.2, 0.5), (0.2, 0.333), (0.4, 0.5), (0.6, 0.6)
Docs 4, 8 never retrieved → recall never reaches 1.0
Draw the complete precision-recall curve.
| Rank | Doc | Relevant? | P@k | R@k |
|---|---|---|---|---|
| 1 | 1 | ✓ | 1/1 = 1.000 | 1/5 = 0.200 |
| 2 | 3 | ✓ | 2/2 = 1.000 | 2/5 = 0.400 |
| 3 | 4 | ✓ | 3/3 = 1.000 | 3/5 = 0.600 |
| 4 | 7 | ✗ | 3/4 = 0.750 | 3/5 = 0.600 |
| 5 | 9 | ✗ | 3/5 = 0.600 | 3/5 = 0.600 |
| 6 | 11 | ✗ | 3/6 = 0.500 | 3/5 = 0.600 |
| 7 | 8 | ✓ | 4/7 = 0.571 | 4/5 = 0.800 |
(0.2, 1.0), (0.4, 1.0), (0.6, 1.0), (0.6, 0.75), (0.6, 0.6), (0.6, 0.5), (0.8, 0.571)
Doc 2 never retrieved → recall never reaches 1.0. Note the first 3 results are all relevant — precision stays at 1.0 early.
C(M + 1) or equivalently C + CM
O(C × t) — for each class, multiply t term probabilities
Why needed: Add 1 to every term count to prevent zero probabilities. A single unseen word zeros out the entire class probability.
Calculate ALL parameters of the Naive Bayes model with add-one smoothing.
Vocabulary: V = {Sausage, Cheese, Wine, Baguette} → B = |V| = 4
German: Sausage(2) Cheese(1) Wine(1) Baguette(0) = 4 total
French: Sausage(0) Cheese(1) Wine(3) Baguette(1) = 5 total
denom_German = 4 + 4 = 8 | denom_French = 5 + 4 = 9
| Term | count(Ger) | P(t|Ger) | count(Fr) | P(t|Fr) |
|---|---|---|---|---|
| Sausage | 2 | 3/8 | 0 | 1/9 |
| Cheese | 1 | 2/8 | 1 | 2/9 |
| Wine | 1 | 2/8 | 3 | 4/9 |
| Baguette | 0 | 1/8 | 1 | 2/9 |
This exam version STOPS here — no test document to classify.
Total parameters: 2 priors + 2×4 = 8 conditionals = 10 parameters
Draw the dendrograms for both single-link and complete-link clustering. Explain your solution and the difference.
Merge A,B (smallest distance = 2)
d({A,B}, C) = min(d(A,C), d(B,C)) = min(4.47, 4) = 4
Merge {A,B} with C at distance 4
d({A,B}, C) = max(d(A,C), d(B,C)) = max(4.47, 4) = 4.47
Merge {A,B} with C at distance 4.47
Single-Link: Complete-Link:
4.47| 4.47|___________
4 |___________ 4 |
3 | | 3 |
2 |___ | 2 |___ |
1 | | | 1 | | |
A B C A B C
Single-link can create "chaining" (long, spread-out clusters). Complete-link creates more compact, spherical clusters.
Draw the dendrograms for both single-link and complete-link clustering. Explain the difference.
Merge d₁,d₂ (smallest distance = 2)
d({d₁,d₂}, d₃) = min(4, 4.47) = 4
d({d₁,d₂}, d₃) = max(4, 4.47) = 4.47
Same structure as Version A — practice with different point labels to build fluency.
M = | 0 0.5 0.5 |
| 0.5 0.5 0 |
| 0.5 0 0.5 |
P = 0.85M + (0.15/3)·1 = 0.85M + 0.05·1
P = | 0.05 0.475 0.475 |
| 0.475 0.475 0.05 |
| 0.475 0.05 0.475 |
x¹ = P·x⁰ = (1/3, 1/3, 1/3) — already converged! (due to symmetric structure)
π = (1/3, 1/3, 1/3)
The process is called the Power Method or Power Iteration.
d₁→d₂, d₂→d₁, d₃→d₁, d₃→d₂
d₁ d₂ d₃
d₁ [ 0 1 0.5 ]
d₂ [ 1 0 0.5 ]
d₃ [ 0 0 0 ]
P = [ 0.05 0.90 0.475 ]
[ 0.90 0.05 0.475 ]
[ 0.05 0.05 0.05 ]
x¹ = (0.475, 0.475, 0.05)
x² = (0.475, 0.475, 0.05) — converged after 1 step
PR(d₁) = PR(d₂) = 0.475 > PR(d₃) = 0.05 ✓
Key insight: symmetry between d₁↔d₂ guarantees equal PR. d₃ has no incoming links → it gets only teleportation (0.15/3 = 0.05).
Lossless: Original data can be perfectly reconstructed — VBC, Gamma, Gap encoding, dictionary-as-a-string + blocking.
Lossy: Some information is permanently lost, but retrieval quality is acceptable.
Static index pruning: Drop postings for low-tf documents — a term that appears once in a 10,000-word document is unlikely to be relevant.
Given data points from two classes, draw a separation that could have been created by: (a) Perceptron (b) SVM (c) kNN with k=1.
| Classifier | Boundary Shape | Key Feature |
|---|---|---|
| SVM | Straight line | Maximum margin between classes (widest gap) |
| Perceptron | Straight line | Any separating line (not necessarily max margin) |
| Naive Bayes | Straight line (axis-aligned) | Based on feature independence assumption |
| kNN k=1 | Irregular, wrapping | Voronoi tessellation — follows nearest points |
| kNN large k | Smoother, less complex | More neighbors → smoother boundary |
| Doc | f₁ (He) | f₂ (street) | Class |
|---|---|---|---|
| She walks street | 0 | 1 | A |
| She walks beach | 0 | 0 | B |
| He walks street | 1 | 1 | B |
| He walks beach | 1 | 0 | A |
Class A when features differ, Class B when features match — this is XOR. No linear classifier can solve XOR → NB, SVM, MaxEnt all fail.
Only kNN with k=1 works — each point is closest to itself.
Need features where classes cluster together. E.g., a single combined feature like "He XOR street".