CF 246E - Blood Cousins Return
We are given a rooted forest describing family relations. Every person has a name and at most one parent. Multiple roots are allowed because some people may have no ancestor at all.
CF 246E - Blood Cousins Return
Rating: 2400
Tags: binary search, data structures, dfs and similar, dp, sortings
Solve time: 1m 38s
Verified: yes
Solution
Problem Understanding
We are given a rooted forest describing family relations. Every person has a name and at most one parent. Multiple roots are allowed because some people may have no ancestor at all.
For each query (v, k), we need the number of distinct names among all descendants of v that are exactly k edges below v.
A useful rephrasing makes the structure clearer. Let depth[x] be the depth of node x from its tree root. A node is a k-son of v exactly when:
- it lies inside the subtree of
v - its depth is
depth[v] + k
So every query asks:
Inside the subtree of
v, how many distinct names appear at depthdepth[v] + k?
The tree size and number of queries both go up to 10^5. Any solution that scans a subtree for every query is immediately too slow. In the worst case, one subtree may contain almost all nodes, leading to roughly 10^10 operations.
The constraints strongly suggest an offline preprocessing approach. We need something around O(n log n) or O((n + m) log n).
Several edge cases are easy to mishandle.
Consider multiple roots:
4
a 0
b 0
c 1
d 2
2
1 1
2 1
The correct answers are:
1
1
A DFS started from only one root would completely miss part of the forest.
Another subtle case appears when the requested depth does not exist:
3
a 0
b 1
c 2
1
1 5
The answer is:
0
A careless implementation might try to access nonexistent depth structures.
Repeated names also matter:
5
john 0
john 1
john 1
alice 2
alice 3
2
1 1
1 2
At distance 1 from node 1, both descendants are named john, so the answer is 1, not 2.
Finally, queries with k = 0 are impossible in this problem because constraints start from 1. That removes one common special case.
Approaches
The brute-force solution is straightforward. For every query (v, k), traverse the subtree of v, collect all nodes whose depth equals depth[v] + k, insert their names into a set, and output the set size.
This is correct because it directly follows the definition of a k-son. The problem is cost. In a chain-like tree, a subtree can contain O(n) nodes. With m = 10^5 queries, the total work becomes O(nm).
We need to reuse work across queries.
The key observation is that every query only depends on two things:
- a subtree
- a single depth
Subtree queries on trees often become interval queries after an Euler tour. During DFS, each subtree becomes a contiguous segment [tin[v], tout[v]].
Now imagine processing nodes depth by depth. For each depth d, we maintain all names appearing at that depth, ordered by Euler time. Then a query becomes:
Among nodes at depth
d = depth[v] + k, count distinct names whose Euler time lies inside[tin[v], tout[v]].
The remaining challenge is counting distinct values inside a range efficiently.
This is where the classic offline distinct-query trick appears. Suppose we process Euler positions from left to right. For every name, we keep only its latest occurrence active in a Fenwick tree. Then the sum on a range gives the number of distinct names inside it.
We can group all nodes by depth and all queries by target depth. For each depth independently:
- Sort nodes by Euler order.
- Activate only the latest occurrence of every name.
- Answer all interval queries for that depth.
Each node and query participates only once, giving an O((n + m) log n) solution.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(nm) | O(n) | Too slow |
| Optimal | O((n + m) log n) | O(n + m) | Accepted |
Algorithm Walkthrough
- Build the forest from the parent array.
Every node with parent 0 becomes a root. Store children adjacency lists.
2. Run DFS from every root.
During DFS, compute:
depth[v]tin[v]tout[v]
We increment a global timer on entry. Because of DFS ordering, every subtree forms a contiguous Euler interval. 3. Group nodes by depth.
For every node v, append (tin[v], name[v]) into depth_nodes[depth[v]].
Queries only compare nodes at one exact depth, so processing depths independently avoids unnecessary work.
4. Convert each query (v, k) into a depth-restricted interval query.
The required depth is:
target = depth[v] + k
The subtree interval is:
[tin[v], tout[v]]
Store queries grouped by target.
5. For each depth separately, process all nodes in Euler order.
Suppose we are processing depth d.
We sort all nodes at depth d by Euler time. Then we sweep from left to right.
Maintain:
- a Fenwick tree over Euler positions
last[name], the latest Euler position where this name appeared
When visiting (pos, name):
- if the name appeared before at
old, remove contribution atold - add contribution at
pos - update
last[name] = pos
At any moment, the Fenwick tree contains exactly one active occurrence per distinct name among processed positions. 6. Answer interval queries offline.
Sort queries for this depth by right endpoint.
While sweeping nodes up to the query's right boundary, update the Fenwick tree.
Then the number of distinct names inside [l, r] is:
bit.query(r) - bit.query(l - 1)
- Output answers in original order.
Why it works
For a fixed depth, the sweep invariant is:
After processing Euler positions up to
x, the Fenwick tree contains1exactly at the latest occurrence of every distinct name among positions<= x.
Consider a query interval [l, r].
If a name appears inside this interval, its latest occurrence up to r must also lie inside the interval. Earlier occurrences were removed when later ones appeared.
So every distinct name contributes exactly one active position inside [l, r], and names absent from the interval contribute zero.
The Fenwick range sum is exactly the number of distinct names in the subtree at the required depth.
Python Solution
import sys
from collections import defaultdict
input = sys.stdin.readline
class Fenwick:
def __init__(self, n):
self.n = n
self.bit = [0] * (n + 1)
def add(self, idx, val):
while idx <= self.n:
self.bit[idx] += val
idx += idx & -idx
def query(self, idx):
res = 0
while idx > 0:
res += self.bit[idx]
idx -= idx & -idx
return res
def range_query(self, l, r):
return self.query(r) - self.query(l - 1)
def solve():
n = int(input())
names = [""] * (n + 1)
children = [[] for _ in range(n + 1)]
roots = []
for i in range(1, n + 1):
s, r = input().split()
r = int(r)
names[i] = s
if r == 0:
roots.append(i)
else:
children[r].append(i)
tin = [0] * (n + 1)
tout = [0] * (n + 1)
depth = [0] * (n + 1)
depth_nodes = defaultdict(list)
timer = 0
sys.setrecursionlimit(1 << 25)
def dfs(v, d):
nonlocal timer
timer += 1
tin[v] = timer
depth[v] = d
depth_nodes[d].append((tin[v], names[v]))
for to in children[v]:
dfs(to, d + 1)
tout[v] = timer
for root in roots:
dfs(root, 0)
m = int(input())
queries_by_depth = defaultdict(list)
ans = [0] * m
for idx in range(m):
v, k = map(int, input().split())
target = depth[v] + k
queries_by_depth[target].append(
(tout[v], tin[v], idx)
)
bit = Fenwick(n)
for d in queries_by_depth:
nodes = depth_nodes.get(d, [])
nodes.sort()
queries = queries_by_depth[d]
queries.sort()
last = {}
ptr = 0
used_positions = []
for r, l, qid in queries:
while ptr < len(nodes) and nodes[ptr][0] <= r:
pos, name = nodes[ptr]
if name in last:
bit.add(last[name], -1)
bit.add(pos, 1)
last[name] = pos
used_positions.append(pos)
ptr += 1
ans[qid] = bit.range_query(l, r)
for pos in used_positions:
bit.add(pos, -1)
print("\n".join(map(str, ans)))
solve()
The DFS phase assigns Euler intervals. Because DFS fully explores a subtree before returning, every descendant of v receives a contiguous Euler range. That property is what converts subtree queries into interval queries.
depth_nodes[d] stores all nodes at depth d. The pair (tin[v], name[v]) is enough because subtree filtering later happens using Euler intervals only.
Queries are grouped by target depth rather than by source node. This is the central reduction. Once a depth is fixed, the problem becomes a standard offline distinct-elements-in-range problem.
The Fenwick tree stores active latest occurrences. When a name reappears, the previous occurrence is removed before inserting the new one. Without this removal step, duplicate names would be counted multiple times.
The cleanup loop at the end of each depth is subtle but necessary. The same Fenwick tree instance is reused across depths, so all updates from the current depth must be reverted.
Another easy mistake is sorting queries by tout[v]. The sweep processes nodes in increasing Euler order, so queries must also advance by right endpoint.
Worked Examples
Sample 1
Input:
6
pasha 0
gerald 1
gerald 1
valera 2
igor 3
olesya 1
5
1 1
1 2
1 3
3 1
6 1
DFS order:
| Node | Name | Depth | tin | tout |
|---|---|---|---|---|
| 1 | pasha | 0 | 1 | 6 |
| 2 | gerald | 1 | 2 | 3 |
| 4 | valera | 2 | 3 | 3 |
| 3 | gerald | 1 | 4 | 5 |
| 5 | igor | 2 | 5 | 5 |
| 6 | olesya | 1 | 6 | 6 |
Query conversion:
| Query | Target Depth | Interval |
|---|---|---|
| (1,1) | 1 | [1,6] |
| (1,2) | 2 | [1,6] |
| (1,3) | 3 | [1,6] |
| (3,1) | 2 | [4,5] |
| (6,1) | 2 | [6,6] |
Processing depth 1:
| Euler Pos | Name | Active Names |
|---|---|---|
| 2 | gerald | {gerald} |
| 4 | gerald | {gerald at 4} |
| 6 | olesya | {gerald, olesya} |
The subtree interval [1,6] contains two active names, so answer is 2.
Processing depth 2:
| Euler Pos | Name | Active Names |
|---|---|---|
| 3 | valera | {valera} |
| 5 | igor | {valera, igor} |
The full subtree gives 2, while interval [4,5] only contains igor, giving 1.
This trace shows why only latest occurrences remain active. The duplicate gerald nodes collapse into one distinct name.
Custom Example
Input:
5
a 0
a 1
b 1
a 2
b 3
3
1 1
1 2
2 1
DFS order:
| Node | Name | Depth | tin | tout |
|---|---|---|---|---|
| 1 | a | 0 | 1 | 5 |
| 2 | a | 1 | 2 | 3 |
| 4 | a | 2 | 3 | 3 |
| 3 | b | 1 | 4 | 5 |
| 5 | b | 2 | 5 | 5 |
Queries:
| Query | Target Depth | Interval |
|---|---|---|
| (1,1) | 1 | [1,5] |
| (1,2) | 2 | [1,5] |
| (2,1) | 2 | [2,3] |
Processing depth 1 gives active names {a, b}, answer 2.
Processing depth 2 gives active names {a, b} for the full subtree, answer 2.
For interval [2,3], only node 4 lies inside, answer 1.
This example demonstrates that subtree filtering and distinct counting interact correctly even when names repeat across different branches.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O((n + m) log n) | Each node and query performs Fenwick updates or queries |
| Space | O(n + m) | Tree storage, Euler arrays, grouped queries, Fenwick tree |
With 10^5 nodes and queries, roughly a few million Fenwick operations fit comfortably within the limits. Memory usage also remains linear.
Test Cases
# helper: run solution on input string, return output string
import sys
import io
from collections import defaultdict
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
input = sys.stdin.readline
class Fenwick:
def __init__(self, n):
self.n = n
self.bit = [0] * (n + 1)
def add(self, idx, val):
while idx <= self.n:
self.bit[idx] += val
idx += idx & -idx
def query(self, idx):
res = 0
while idx > 0:
res += self.bit[idx]
idx -= idx & -idx
return res
def range_query(self, l, r):
return self.query(r) - self.query(l - 1)
n = int(input())
names = [""] * (n + 1)
children = [[] for _ in range(n + 1)]
roots = []
for i in range(1, n + 1):
s, r = input().split()
r = int(r)
names[i] = s
if r == 0:
roots.append(i)
else:
children[r].append(i)
tin = [0] * (n + 1)
tout = [0] * (n + 1)
depth = [0] * (n + 1)
depth_nodes = defaultdict(list)
timer = 0
sys.setrecursionlimit(1 << 25)
def dfs(v, d):
nonlocal timer
timer += 1
tin[v] = timer
depth[v] = d
depth_nodes[d].append((tin[v], names[v]))
for to in children[v]:
dfs(to, d + 1)
tout[v] = timer
for r in roots:
dfs(r, 0)
m = int(input())
queries_by_depth = defaultdict(list)
ans = [0] * m
for idx in range(m):
v, k = map(int, input().split())
target = depth[v] + k
queries_by_depth[target].append(
(tout[v], tin[v], idx)
)
bit = Fenwick(n)
for d in queries_by_depth:
nodes = sorted(depth_nodes.get(d, []))
queries = sorted(queries_by_depth[d])
last = {}
ptr = 0
used = []
for r, l, idx in queries:
while ptr < len(nodes) and nodes[ptr][0] <= r:
pos, name = nodes[ptr]
if name in last:
bit.add(last[name], -1)
bit.add(pos, 1)
last[name] = pos
used.append(pos)
ptr += 1
ans[idx] = bit.range_query(l, r)
for pos in used:
bit.add(pos, -1)
return "\n".join(map(str, ans))
# provided sample
assert run(
"""6
pasha 0
gerald 1
gerald 1
valera 2
igor 3
olesya 1
5
1 1
1 2
1 3
3 1
6 1
"""
) == "2\n2\n0\n1\n0"
# minimum size
assert run(
"""1
alex 0
1
1 1
"""
) == "0"
# repeated names
assert run(
"""5
john 0
john 1
john 1
alice 2
alice 3
2
1 1
1 2
"""
) == "1\n1"
# multiple roots
assert run(
"""4
a 0
b 0
c 1
d 2
2
1 1
2 1
"""
) == "1\n1"
# chain tree, missing depth
assert run(
"""4
a 0
b 1
c 2
d 3
2
1 5
2 2
"""
) == "0\n1"
| Test input | Expected output | What it validates |
|---|---|---|
| Single node tree | 0 | Missing target depth |
| Repeated names | 1 1 | Distinct counting correctness |
| Multiple roots | 1 1 | Forest traversal correctness |
| Deep chain with large k | 0 1 | Boundary depth handling |
Edge Cases
Consider a forest with multiple roots:
4
a 0
b 0
c 1
d 2
2
1 1
2 1
DFS starts from both roots 1 and 2. Euler intervals become:
| Node | tin | tout |
|---|---|---|
| 1 | 1 | 2 |
| 3 | 2 | 2 |
| 2 | 3 | 4 |
| 4 | 4 | 4 |
Query (1,1) maps to interval [1,2] at depth 1, which only contains node 3. Query (2,1) maps to [3,4], containing only node 4.
The algorithm handles disconnected trees because every root launches its own DFS.
Now consider nonexistent depths:
3
a 0
b 1
c 2
1
1 5
The target depth is 5, but depth_nodes[5] is empty.
During processing, the node list for this depth is empty, so no Fenwick updates happen. The range query immediately returns 0.
Repeated names inside the same subtree behave correctly:
5
john 0
john 1
john 1
alice 2
alice 3
2
1 1
1 2
At depth 1, nodes (2, john) and (3, john) appear. When processing the second occurrence, the first one is removed from the Fenwick tree.
So the subtree interval contains exactly one active occurrence for john, producing answer 1 instead of 2.