CF 220B - Little Elephant and Array
We are given an array of positive integers and multiple queries, each specifying a contiguous subarray. For each query, we are asked to count how many numbers appear in the subarray exactly as many times as their own value.
CF 220B - Little Elephant and Array
Rating: 1800
Tags: constructive algorithms, data structures
Solve time: 1m 7s
Verified: yes
Solution
Problem Understanding
We are given an array of positive integers and multiple queries, each specifying a contiguous subarray. For each query, we are asked to count how many numbers appear in the subarray exactly as many times as their own value. For instance, if a number 3 appears exactly 3 times in the queried segment, it contributes to the count. The output is one integer per query.
The constraints are tight. Both the array size and the number of queries can reach 100,000. A naive approach that iterates over each query and counts occurrences for every number would require roughly $O(n \cdot m)$ operations in the worst case. That could reach $10^{10}$ operations, far beyond what a 4-second time limit allows. Any algorithm we use must avoid iterating over the entire subarray for every query.
An important edge case is numbers larger than the length of the queried segment. A number greater than the segment length can never appear exactly that many times. For example, in the array [1, 5, 5, 2] with query [1, 3], the number 5 cannot contribute because it would need to appear 5 times in a subarray of length 3. Another subtle case is repeated small numbers: if 1 appears once, it counts; if 2 appears twice, it counts; if it appears more than twice, it does not. Handling these correctly is crucial.
Approaches
The brute-force approach iterates over every query, builds a frequency dictionary of numbers in the subarray, and counts how many numbers have frequency equal to their own value. This works correctly but has worst-case complexity $O(n \cdot m)$, which is too slow.
The key insight for optimization is that the problem is a variant of range query problems, where we need the frequency of elements over subarrays. Traditional segment trees or binary-indexed trees can be adapted, but a more fitting approach here is Mo's algorithm, which processes queries in a specific order to minimize the number of array elements added or removed from the current segment when moving between queries. Mo's algorithm sorts queries by block and right endpoint, then maintains a sliding window on the array. We keep a frequency dictionary of numbers in the current window and maintain a running count of numbers whose frequency equals their value. When we add or remove an element from the window, we update the count in constant time. This reduces the complexity to $O(n \sqrt n)$ plus $O(m \sqrt n)$ for query adjustments.
The optimization relies on the observation that the frequency of a number changes predictably as we expand or shrink the segment, and that we only need to check if the frequency matches the number itself. Any number greater than $n$ (the array size) can be ignored because it cannot appear enough times to satisfy the condition.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(n*m) | O(n) | Too slow |
| Mo's Algorithm | O((n + m) * sqrt(n)) | O(n) | Accepted |
Algorithm Walkthrough
- Read the array and queries. Keep the queries with their original indices so we can restore the output order after sorting.
- Determine a block size, typically $\sqrt n$, to divide the array for Mo's algorithm.
- Sort the queries first by block of the left endpoint, then by right endpoint. This ordering ensures that when moving from one query to the next, we minimize the number of elements added or removed from the current segment.
- Initialize the current segment as empty. Keep a frequency dictionary of elements in the segment, and a running variable
current_answerfor the number of elements whose frequency equals their value. - For each query in sorted order, adjust the segment to match the query range:
- Expand or shrink the left and right boundaries. For each element added, increase its frequency in the dictionary. If the frequency equals the element, increment
current_answer. If the frequency before adding was equal to the element, decrementcurrent_answer(since the match is broken). Similarly, when removing an element, update the frequency andcurrent_answer.
- Store
current_answerin a result array at the index corresponding to the original query. - Output the result array in the original query order.
Why it works: The invariant is that current_answer always reflects the count of numbers in the current segment whose frequency equals their value. Each addition or removal correctly updates this invariant in constant time, so when a query segment is reached, current_answer is guaranteed to be correct.
Python Solution
import sys
import math
from collections import defaultdict
input = sys.stdin.readline
n, m = map(int, input().split())
a = list(map(int, input().split()))
queries = []
for i in range(m):
l, r = map(int, input().split())
queries.append((l-1, r-1, i))
block_size = int(math.sqrt(n)) + 1
queries.sort(key=lambda x: (x[0] // block_size, x[1]))
freq = defaultdict(int)
current_answer = 0
res = [0] * m
def add(x):
global current_answer
f = freq[x]
if f == x:
current_answer -= 1
freq[x] += 1
if freq[x] == x:
current_answer += 1
def remove(x):
global current_answer
f = freq[x]
if f == x:
current_answer -= 1
freq[x] -= 1
if freq[x] == x:
current_answer += 1
l, r = 0, -1
for ql, qr, idx in queries:
while r < qr:
r += 1
if a[r] <= n:
add(a[r])
while r > qr:
if a[r] <= n:
remove(a[r])
r -= 1
while l < ql:
if a[l] <= n:
remove(a[l])
l += 1
while l > ql:
l -= 1
if a[l] <= n:
add(a[l])
res[idx] = current_answer
print('\n'.join(map(str, res)))
The solution reads input and prepares the queries with their original indices. We compute a block size based on the array length to organize the queries for Mo's algorithm. The add and remove functions carefully maintain the running count of numbers whose frequency matches their value. Numbers larger than n are ignored because they can never satisfy the condition. The sorted queries are processed in order, updating the segment incrementally, ensuring that each query result is correct.
Worked Examples
Sample input:
7 2
3 1 2 2 3 3 7
1 7
3 4
| Step | Segment | Frequencies | current_answer | Query index |
|---|---|---|---|---|
| Start | [] | {} | 0 | - |
| Query 1 [0,6] | [3,1,2,2,3,3,7] | {1:1,2:2,3:3,7:1} | 3 | 0 |
| Query 2 [2,3] | [2,2] | {2:2} | 1 | 1 |
This shows that the algorithm correctly tracks frequencies and counts numbers satisfying the condition, adjusting as the window moves between queries.
Another test:
Input:
5 1
5 5 5 5 5
1 5
All numbers are larger than 5, so none can match their frequency. current_answer remains 0, output is 0. The algorithm correctly ignores elements that can never satisfy the condition.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O((n + m) * sqrt(n)) | Mo's algorithm processes each query with ~sqrt(n) adjustments per query in the worst case. |
| Space | O(n) | Frequency dictionary can hold at most n elements; result array of size m. |
With $n, m \le 10^5$, $(n + m) * \sqrt n \approx 10^5 * 316 \approx 3*10^7$ operations, comfortably within 4 seconds.
Test Cases
import sys, io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
exec(open("solution.py").read())
return sys.stdout.getvalue().strip()
# provided samples
assert run("7 2\n3 1 2 2 3 3 7\n1 7\n3 4\n") == "3\n1", "sample 1"
# custom cases
assert run("5 1\n5 5 5 5 5\n1 5\n") == "0", "all elements too large"
assert run("4 2\n1 1 1 1\n1 4\n2 3\n") == "1\n1", "all ones"
assert run("6 1\n1 2 2 3 3 3\n1 6\n") == "3", "mixed small frequencies"
assert run("1 1\n1\n1 1\n") == "1", "single element array"
| Test input | Expected output | What it validates |
|---|---|---|
| 5 1\n |