CF 178C3 - Smart Beaver and Resolving Collisions
We maintain a hash table with h cells numbered from 0 to h - 1. Every insertion starts from the object's hash value and probes cells by repeatedly adding m modulo h until it finds an empty slot.
CF 178C3 - Smart Beaver and Resolving Collisions
Rating: 2000
Tags: -
Solve time: 2m 38s
Verified: yes
Solution
Problem Understanding
We maintain a hash table with h cells numbered from 0 to h - 1. Every insertion starts from the object's hash value and probes cells by repeatedly adding m modulo h until it finds an empty slot.
For example, if h = 10, m = 3, and the hash value is 4, the probing order is:
4 -> 7 -> 0 -> 3 -> 6 -> 9 -> ...
Every time insertion checks an occupied cell, that counts as one dummy call. We must process a sequence of insertions and deletions and compute the total number of dummy calls across all insertions.
The challenge is that both the table size and the number of operations can reach 2 * 10^5, while the time limit is only one second. A direct simulation of probing during every insertion is dangerous because a single insertion may scan almost the whole table. In the worst case, 2 * 10^5 insertions each taking O(h) work becomes roughly 4 * 10^10 operations, which is completely infeasible.
The structure of the probing sequence is the key observation. Starting from a hash value t, probing visits:
t, (t + m) mod h, (t + 2m) mod h, ...
This is not arbitrary movement through the table. It forms a fixed cyclic order. Once we understand that structure, the problem becomes maintaining the nearest free position in a cycle under insertions and deletions.
There are several subtle edge cases that break naive implementations.
Consider this input:
5 2 5
+ 1 0
+ 2 0
- 1
+ 3 0
+ 4 0
The probing sequence from 0 is:
0 -> 2 -> 4 -> 1 -> 3
After deleting object 1, cell 0 becomes free again. The next insertion with hash 0 must reuse cell 0 immediately. A careless implementation that only appends objects to the end of a chain would incorrectly place it later in the cycle.
Another tricky case appears when gcd(h, m) > 1.
6 2 4
+ 1 0
+ 2 0
+ 3 0
+ 4 1
From hash 0, probing only reaches even positions:
0 -> 2 -> 4 -> 0
From hash 1, probing only reaches odd positions:
1 -> 3 -> 5 -> 1
The table splits into independent cycles. Treating the entire table as one global circular array produces incorrect answers because objects from one cycle never interact with the other.
One more subtle point is that the number of dummy calls can exceed 32-bit integer range. If we repeatedly fill long probe chains, the total may reach around 10^10. The accumulator must use 64-bit arithmetic.
Approaches
The brute-force approach follows the statement literally.
For each insertion with hash t, repeatedly check:
t
(t + m) mod h
(t + 2m) mod h
...
until an empty cell appears. Count how many occupied cells were encountered before reaching the empty one.
This method is correct because it exactly simulates linear probing. Deletion is simple because we already know the occupied position of every object by its identifier.
The problem is complexity. In the worst case, almost every insertion scans nearly the whole cycle before finding a free slot. With h = 2 * 10^5 and n = 2 * 10^5, the total work becomes quadratic.
The optimization comes from looking at the probing order differently.
Fix some starting hash t. The sequence
t, (t + m) mod h, (t + 2m) mod h, ...
always walks through the same cycle in the same order. Instead of thinking about the actual cell numbers, we can assign every reachable position an index inside its cycle.
Suppose a cycle contains:
[0, 2, 4, 6]
If cells 0 and 2 are occupied while 4 is free, then inserting with hash 0 simply asks:
"How many consecutive occupied positions appear starting from index 0 before the first free position?"
This becomes a dynamic next-free-position problem.
We can solve it with a disjoint set union structure that skips occupied positions. Every occupied slot points to the next candidate position. Finding the representative gives the first free index in almost constant amortized time.
The table may contain several disconnected probing cycles because of gcd(h, m). We process every cycle independently.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(nh) | O(h) | Too slow |
| Optimal | O(h + n α(h)) | O(h) | Accepted |
Algorithm Walkthrough
- Build all probing cycles of the hash table.
Starting from every unvisited cell, repeatedly jump by m modulo h until returning to the start. Each traversal forms one cycle.
2. For every cycle, assign each table position an index inside that cycle.
We store:
cycle_id[position]
index_in_cycle[position]
This converts probing into movement along a simple circular array.
3. For every cycle of length L, create a DSU array of size 2L + 1.
We duplicate the cycle so wrap-around queries become linear interval queries. The extra sentinel node 2L represents "no free position".
4. Initially, every DSU node is its own parent.
A node represents a free position. Once a position becomes occupied, we union it with the next position so future searches skip it.
5. During insertion with hash t, find:
c = cycle_id[t]
p = index_in_cycle[t]
Then query the DSU for the first free index at or after p.
6. If the returned index is at least p + L, wrap-around occurred.
The actual slot inside the cycle is:
real_index = found % L
7. The number of dummy calls equals:
found - p
because every earlier position in the probing order was occupied. 8. Mark that position occupied by unioning it with the next index in both copies of the duplicated cycle.
This keeps future searches skipping already occupied cells. 9. Store the chosen table cell for this object identifier so deletion can free it later. 10. During deletion, recover the occupied cell and restore both duplicated copies as free DSU roots.
Why it works:
Inside one cycle, probing always moves forward by one logical step in the cycle ordering. The DSU invariant is that find(x) always returns the first free position at or after index x. Occupied positions are linked to their successor, so path compression skips entire occupied blocks in near constant time. Duplicating the cycle removes special wrap-around handling because every probe sequence of length at most L becomes a contiguous interval in the doubled array. Since insertion always chooses the first free position in probing order, the computed dummy count is exactly the number of occupied positions skipped.
Python Solution
import sys
input = sys.stdin.readline
def solve():
h, m, n = map(int, input().split())
visited = [False] * h
cycle_id = [-1] * h
pos_in_cycle = [-1] * h
cycles = []
for start in range(h):
if visited[start]:
continue
cur = start
cyc = []
while not visited[cur]:
visited[cur] = True
pos_in_cycle[cur] = len(cyc)
cycle_id[cur] = len(cycles)
cyc.append(cur)
cur = (cur + m) % h
cycles.append(cyc)
parents = []
lengths = []
for cyc in cycles:
L = len(cyc)
lengths.append(L)
parent = list(range(2 * L + 1))
parents.append(parent)
def find(parent, x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
occupied = {}
answer = 0
for _ in range(n):
parts = input().split()
if parts[0] == '+':
obj_id = int(parts[1])
hsh = int(parts[2])
cid = cycle_id[hsh]
pos = pos_in_cycle[hsh]
parent = parents[cid]
L = lengths[cid]
nxt = find(parent, pos)
if nxt >= pos + L:
nxt = find(parent, pos + L)
answer += nxt - pos
real = nxt % L
cell = cycles[cid][real]
occupied[obj_id] = (cid, real, cell)
for idx in (real, real + L):
parent[idx] = find(parent, idx + 1)
else:
obj_id = int(parts[1])
cid, real, cell = occupied.pop(obj_id)
parent = parents[cid]
L = lengths[cid]
parent[real] = real
parent[real + L] = real + L
print(answer)
solve()
The first section constructs the probing cycles. This is the structural heart of the solution. Because repeatedly adding m modulo h eventually repeats, every table position belongs to exactly one cycle.
The arrays cycle_id and pos_in_cycle convert physical table cells into logical positions inside a cycle. Once this mapping exists, probing no longer depends on modular arithmetic during queries.
The DSU structure stores the next free position. When a slot becomes occupied, we redirect it to the next candidate position. Path compression makes repeated searches extremely fast.
The duplicated interval is the most important implementation detail. Suppose a cycle has length 5 and probing starts near the end. Without duplication, wrap-around logic becomes messy. With duplication, every valid probing sequence becomes a normal forward interval inside [0, 2L).
Deletion restores both copies of the position. Forgetting the duplicated copy is a common bug because future wrapped searches would incorrectly treat the slot as occupied.
The answer accumulator uses Python integers, which automatically handle large values safely.
Worked Examples
Example 1
Input:
10 2 7
+ 11 0
+ 22 2
+ 33 6
+ 44 0
+ 55 0
- 22
+ 66 0
The cycle containing hash 0 is:
[0, 2, 4, 6, 8]
| Operation | Probe Start | First Free | Dummy Calls | Occupied Cells |
|---|---|---|---|---|
| +11 0 | 0 | 0 | 0 | {0} |
| +22 2 | 1 | 1 | 0 | {0,2} |
| +33 6 | 3 | 3 | 0 | {0,2,6} |
| +44 0 | 0 | 2 | 2 | {0,2,4,6} |
| +55 0 | 0 | 4 | 4 | {0,2,4,6,8} |
| -22 | - | - | 0 | {0,4,6,8} |
| +66 0 | 0 | 1 | 1 | {0,2,4,6,8} |
Total dummy calls:
0 + 0 + 0 + 2 + 4 + 1 = 7
This trace shows how deletions reopen earlier positions in the probing order. The insertion of 66 immediately reuses cell 2.
Example 2
Input:
6 2 5
+ 1 0
+ 2 0
+ 3 1
+ 4 1
+ 5 0
Cycles:
Even cycle: [0,2,4]
Odd cycle: [1,3,5]
| Operation | Cycle | Free Position | Dummy Calls |
|---|---|---|---|
| +1 0 | Even | 0 | 0 |
| +2 0 | Even | 2 | 1 |
| +3 1 | Odd | 1 | 0 |
| +4 1 | Odd | 3 | 1 |
| +5 0 | Even | 4 | 2 |
Total:
4
This example demonstrates why cycles are independent. Insertions starting from odd hashes never interact with even hashes.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(h + n α(h)) | Cycle construction is linear, DSU operations are almost constant |
| Space | O(h) | Cycles, DSU arrays, and mappings store linear data |
The inverse Ackermann factor from DSU is effectively constant in practice. With 2 * 10^5 operations, this comfortably fits inside the one second limit.
Test Cases
# helper: run solution on input string, return output string
import sys
import io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
input = sys.stdin.readline
h, m, n = map(int, input().split())
visited = [False] * h
cycle_id = [-1] * h
pos_in_cycle = [-1] * h
cycles = []
for start in range(h):
if visited[start]:
continue
cur = start
cyc = []
while not visited[cur]:
visited[cur] = True
pos_in_cycle[cur] = len(cyc)
cycle_id[cur] = len(cycles)
cyc.append(cur)
cur = (cur + m) % h
cycles.append(cyc)
parents = []
lengths = []
for cyc in cycles:
L = len(cyc)
lengths.append(L)
parents.append(list(range(2 * L + 1)))
def find(parent, x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
occupied = {}
ans = 0
for _ in range(n):
parts = input().split()
if parts[0] == '+':
obj_id = int(parts[1])
hsh = int(parts[2])
cid = cycle_id[hsh]
pos = pos_in_cycle[hsh]
parent = parents[cid]
L = lengths[cid]
nxt = find(parent, pos)
if nxt >= pos + L:
nxt = find(parent, pos + L)
ans += nxt - pos
real = nxt % L
cell = cycles[cid][real]
occupied[obj_id] = (cid, real, cell)
for idx in (real, real + L):
parent[idx] = find(parent, idx + 1)
else:
obj_id = int(parts[1])
cid, real, cell = occupied.pop(obj_id)
parent = parents[cid]
parent[real] = real
parent[real + lengths[cid]] = real + lengths[cid]
return str(ans)
# provided sample
assert run(
"""10 2 7
+ 11 0
+ 22 2
+ 33 6
+ 44 0
+ 55 0
- 22
+ 66 0
"""
) == "7"
# minimum size cycle
assert run(
"""2 1 2
+ 1 0
+ 2 0
"""
) == "1"
# deletion restores earlier slot
assert run(
"""5 2 5
+ 1 0
+ 2 0
- 1
+ 3 0
+ 4 0
"""
) == "1"
# independent cycles
assert run(
"""6 2 5
+ 1 0
+ 2 0
+ 3 1
+ 4 1
+ 5 0
"""
) == "4"
# full wrap-around probing
assert run(
"""5 1 5
+ 1 0
+ 2 0
+ 3 0
+ 4 0
+ 5 0
"""
) == "10"
| Test input | Expected output | What it validates |
|---|---|---|
| Minimum table with two insertions | 1 | Smallest non-trivial probing chain |
| Deletion and reinsertion | 1 | Freed positions are reused correctly |
| Multiple probing cycles | 4 | Independent cycle handling |
| Full wrap-around | 10 | Correct counting across circular probing |
Edge Cases
Consider again the deletion reuse case:
5 2 5
+ 1 0
+ 2 0
- 1
+ 3 0
+ 4 0
The probing cycle is:
0 -> 2 -> 4 -> 1 -> 3
After the first two insertions, cells 0 and 2 are occupied. Deleting object 1 frees cell 0. The DSU restoration step sets that position back to its own parent, meaning future find() operations can stop there again. The insertion of object 3 immediately chooses cell 0 with zero dummy calls.
Now consider disconnected cycles:
6 2 4
+ 1 0
+ 2 0
+ 3 1
+ 4 1
The probing graph splits into:
Even cycle: 0 -> 2 -> 4
Odd cycle: 1 -> 3 -> 5
Objects inserted from hash 0 never inspect odd cells. The algorithm respects this because every cell stores its own cycle_id, and all DSU operations happen only inside that cycle.
Finally, consider complete wrap-around:
5 1 5
+ 1 0
+ 2 0
+ 3 0
+ 4 0
+ 5 0
The fifth insertion probes:
0 -> 1 -> 2 -> 3 -> 4
which gives four dummy calls. The duplicated DSU array handles this naturally because indices [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] represent two consecutive copies of the cycle, so wrapping never requires special modular-case logic.