CF 123D - String
We are given a string s. For every substring x of s, we look at all positions where x appears inside s. Suppose these occurrences start at positions: $$p1 < p2 < dots < pk$$ The function F(s, x) counts how many contiguous segments we can choose from this ordered list of…
Rating: 2300
Tags: string suffix structures
Solve time: 2m 32s
Verified: yes
Solution
Problem Understanding
We are given a string s. For every substring x of s, we look at all positions where x appears inside s.
Suppose these occurrences start at positions:
$$p_1 < p_2 < \dots < p_k$$
The function F(s, x) counts how many contiguous segments we can choose from this ordered list of occurrences.
If there are k occurrences, then the number of contiguous subarrays is:
$$\frac{k(k+1)}{2}$$
For example, if a substring appears at positions [1,4,9], the valid contiguous groups are:
$$[1], [4], [9], [1,4], [4,9], [1,4,9]$$
which gives 6.
The task is to compute:
$$\sum_x F(s,x)$$
over all distinct substrings x of s.
The string length is at most 10^5, which immediately rules out anything close to quadratic over all substrings. A string of length 10^5 has roughly 5 \cdot 10^9 substrings counting duplicates, and even the number of distinct substrings can reach about 5 \cdot 10^9 in the worst case conceptually. We need a representation that compresses all substrings together.
This is a suffix structure problem because substrings correspond naturally to prefixes of suffixes, and repeated substrings correspond to common prefixes between suffixes.
A few edge cases are easy to mishandle.
Consider:
aaaa
The substring "a" appears 4 times, so its contribution is:
$$4 + 3 + 2 + 1 = 10$$
A careless implementation might count only the number of occurrences instead of the number of contiguous occurrence ranges.
Another dangerous case is:
abcd
Every substring appears exactly once, so every F(s,x)=1. The answer equals the number of distinct substrings:
$$\frac{n(n+1)}{2}$$
Any logic that assumes repeated substrings exist everywhere will overcount here.
One more subtle case is highly overlapping repetitions:
ababab
The substring "ab" appears at positions 1,3,5. These are still three separate occurrences even though they overlap with other repeated patterns. The suffix structure must distinguish occurrences by suffix positions, not by interval disjointness.
Approaches
The brute force idea is straightforward.
Enumerate every distinct substring x. For each one, scan the string and collect all occurrence positions. If there are k occurrences, add:
$$\frac{k(k+1)}{2}$$
to the answer.
This is correct because the definition of F depends only on the occurrence list.
The problem is complexity. There are O(n^2) substrings, and checking occurrences naively costs another O(n). Even with hashing or suffix array optimizations, processing all substrings independently becomes too large for n = 10^5.
The key observation is that many substrings share the same occurrence set structure.
This is exactly what the suffix automaton compresses.
In a suffix automaton, each state represents a whole class of substrings. Every substring represented by a state has the same set of ending positions, which means they all have the same occurrence count.
Suppose a state contains substrings whose lengths are:
$$(\text{link.len}+1) \dots \text{len}$$
All these substrings occur exactly cnt[state] times.
Then every one of them contributes:
$$\frac{cnt(cnt+1)}{2}$$
So the total contribution of that state is:
$$(\text{len} - \text{link.len}) \times \frac{cnt(cnt+1)}{2}$$
Now the problem becomes:
- Build the suffix automaton.
- Compute occurrence counts for each state.
- Sum the contribution formula over all states.
This reduces the problem to linear time.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O(n³) | O(n²) | Too slow |
| Optimal | O(n) | O(n) | Accepted |
Algorithm Walkthrough
- Build a suffix automaton for the string.
Each state stores:
len, the maximum length represented by the state.link, the suffix link.- transitions by characters.
cnt, the number of end positions passing through the state.
- While extending the automaton with each character, initialize the newly created state with
cnt = 1.
Every newly added suffix corresponds to one occurrence endpoint. 3. Handle transitions and cloning exactly as in the standard suffix automaton construction.
Cloned states represent structural splitting only, so their initial occurrence count is 0.
4. After construction, propagate occurrence counts upward through suffix links.
Process states in decreasing order of len.
If state v has suffix link p, then:
$$cnt[p] += cnt[v]$$
This works because every occurrence represented by v also belongs to its suffix-link ancestor.
5. For every non-root state, compute how many distinct substring lengths belong uniquely to this state.
If:
$$p = link[v]$$
then the represented substring lengths are:
$$len[p] + 1 \dots len[v]$$
giving:
$$len[v] - len[p]$$
distinct substrings. 6. Let:
$$k = cnt[v]$$
Every substring represented by this state appears exactly k times, so each contributes:
$$\frac{k(k+1)}{2}$$
Multiply by the number of represented substrings and add to the answer.
Why it works
A suffix automaton groups substrings by their set of ending positions. Two substrings belong to the same state exactly when they occur in the same places inside the string.
Because F(s,x) depends only on the number of occurrences of x, all substrings represented by the same state contribute equally.
For a state v, all represented substring lengths form the interval:
$$(len[link[v]] + 1) \dots len[v]$$
Every one of these substrings appears cnt[v] times.
So the state's total contribution is exactly:
$$(len[v] - len[link[v]]) \times \frac{cntv}{2}$$
Summing this over all states counts every distinct substring exactly once.
Python Solution
import sys
input = sys.stdin.readline
class SuffixAutomaton:
def __init__(self, n):
self.next = [{} for _ in range(2 * n)]
self.link = [-1] * (2 * n)
self.length = [0] * (2 * n)
self.cnt = [0] * (2 * n)
self.size = 1
self.last = 0
def extend(self, c):
cur = self.size
self.size += 1
self.length[cur] = self.length[self.last] + 1
self.cnt[cur] = 1
p = self.last
while p != -1 and c not in self.next[p]:
self.next[p][c] = cur
p = self.link[p]
if p == -1:
self.link[cur] = 0
else:
q = self.next[p][c]
if self.length[p] + 1 == self.length[q]:
self.link[cur] = q
else:
clone = self.size
self.size += 1
self.next[clone] = self.next[q].copy()
self.length[clone] = self.length[p] + 1
self.link[clone] = self.link[q]
self.cnt[clone] = 0
while p != -1 and self.next[p].get(c) == q:
self.next[p][c] = clone
p = self.link[p]
self.link[q] = clone
self.link[cur] = clone
self.last = cur
def solve():
s = input().strip()
n = len(s)
sam = SuffixAutomaton(n)
for ch in s:
sam.extend(ch)
order = list(range(sam.size))
order.sort(key=lambda x: sam.length[x], reverse=True)
for v in order:
p = sam.link[v]
if p != -1:
sam.cnt[p] += sam.cnt[v]
ans = 0
for v in range(1, sam.size):
p = sam.link[v]
num_substrings = sam.length[v] - sam.length[p]
k = sam.cnt[v]
ans += num_substrings * (k * (k + 1) // 2)
print(ans)
solve()
The automaton stores transitions as dictionaries because the alphabet is small and sparse. Each extension creates one new state and sometimes one clone.
The propagation order matters. Counts must move from longer states to shorter ones, so we sort states by decreasing len. Reversing this order would leave parents incomplete when processed.
The expression:
sam.length[v] - sam.length[p]
is easy to get wrong. It counts how many distinct substring lengths belong exclusively to state v. Using +1 here would overcount.
Cloned states start with cnt = 0. They do not correspond to new suffix endpoints. Forgetting this causes large overcounting in repetitive strings.
All arithmetic uses Python integers naturally, since the answer can exceed 32-bit range.
Worked Examples
Example 1
Input:
aaaa
The automaton states after propagation behave like this:
| State | len | link len | occurrence count | represented substrings | contribution |
|---|---|---|---|---|---|
| 1 | 1 | 0 | 4 | "a" |
10 |
| 2 | 2 | 1 | 3 | "aa" |
6 |
| 3 | 3 | 2 | 2 | "aaa" |
3 |
| 4 | 4 | 3 | 1 | "aaaa" |
1 |
Total:
$$10 + 6 + 3 + 1 = 20$$
This example demonstrates why occurrence propagation is necessary. The last state initially has count 1, but after suffix-link accumulation, shorter repeated substrings receive all occurrences correctly.
Example 2
Input:
abcd
| State | len | link len | occurrence count | represented substrings | contribution |
|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | "a" |
1 |
| 2 | 2 | 1 | 1 | "ab" |
1 |
| 3 | 3 | 2 | 1 | "abc" |
1 |
| 4 | 4 | 3 | 1 | "abcd" |
1 |
Additional states similarly represent "b", "bc", "bcd", etc., all with count 1.
Every substring appears once, so each contributes exactly 1.
The total equals the number of distinct substrings:
$$\frac{4 \cdot 5}{2} = 10$$
This confirms that the formula degenerates correctly when no repetitions exist.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Suffix automaton construction and count propagation are linear |
| Space | O(n) | At most 2n states are created |
A suffix automaton for a string of length 10^5 contains at most 2n-1 states. All operations during extension are amortized linear, so the solution comfortably fits within the limits.
Test Cases
# helper: run solution on input string, return output string
import sys
import io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
input = sys.stdin.readline
class SuffixAutomaton:
def __init__(self, n):
self.next = [{} for _ in range(2 * n)]
self.link = [-1] * (2 * n)
self.length = [0] * (2 * n)
self.cnt = [0] * (2 * n)
self.size = 1
self.last = 0
def extend(self, c):
cur = self.size
self.size += 1
self.length[cur] = self.length[self.last] + 1
self.cnt[cur] = 1
p = self.last
while p != -1 and c not in self.next[p]:
self.next[p][c] = cur
p = self.link[p]
if p == -1:
self.link[cur] = 0
else:
q = self.next[p][c]
if self.length[p] + 1 == self.length[q]:
self.link[cur] = q
else:
clone = self.size
self.size += 1
self.next[clone] = self.next[q].copy()
self.length[clone] = self.length[p] + 1
self.link[clone] = self.link[q]
self.cnt[clone] = 0
while p != -1 and self.next[p].get(c) == q:
self.next[p][c] = clone
p = self.link[p]
self.link[q] = clone
self.link[cur] = clone
self.last = cur
s = input().strip()
sam = SuffixAutomaton(len(s))
for ch in s:
sam.extend(ch)
order = list(range(sam.size))
order.sort(key=lambda x: sam.length[x], reverse=True)
for v in order:
p = sam.link[v]
if p != -1:
sam.cnt[p] += sam.cnt[v]
ans = 0
for v in range(1, sam.size):
p = sam.link[v]
ans += (
(sam.length[v] - sam.length[p])
* (sam.cnt[v] * (sam.cnt[v] + 1) // 2)
)
return str(ans)
# provided samples
assert run("aaaa\n") == "20", "sample 1"
# custom cases
assert run("a\n") == "1", "single character"
assert run("ab\n") == "3", "all substrings unique"
assert run("aaa\n") == "10", "high overlap repetitions"
assert run("abab\n") == "12", "alternating repeated structure"
| Test input | Expected output | What it validates |
|---|---|---|
a |
1 |
Minimum size |
ab |
3 |
No repeated substrings |
aaa |
10 |
Heavy overlap and repeated counts |
abab |
12 |
Clone handling and suffix-link propagation |
Edge Cases
Consider:
a
There is only one substring, "a", appearing once.
The automaton creates one non-root state with:
$$cnt = 1$$
Contribution:
$$1 \times \frac{1 \cdot 2}{2} = 1$$
The algorithm outputs 1.
Now consider:
aaaa
This is the hardest kind of repetition because every substring overlaps heavily.
During propagation:
"aaaa"contributes to"aaa""aaa"contributes to"aa""aa"contributes to"a"
Final occurrence counts become 1,2,3,4 respectively.
The algorithm correctly computes:
$$10 + 6 + 3 + 1 = 20$$
A buggy implementation that forgets suffix-link accumulation would incorrectly treat every substring as occurring once.
Finally consider:
ababab
The substring "ab" appears three times. The automaton groups all copies together through shared end-position equivalence classes.
The corresponding state receives:
$$cnt = 3$$
so "ab" contributes:
$$\frac{3 \cdot 4}{2} = 6$$
This confirms that overlapping occurrences are counted independently and correctly.