CF 316G2 - Good Substrings

Rating: 2200
Tags: string suffix structures
Solve time: 3m 7s
Verified: no

Solution

Problem Understanding

We are given a source string s. We are interested only in substrings that appear inside s, and we count each distinct substring once, regardless of how many times it occurs.

A substring is considered good if it satisfies every rule. Each rule consists of a pattern string p and a range [l, r]. For a candidate substring t, we count how many times t occurs inside p as a substring. The rule is satisfied when that occurrence count lies between l and r.

The task is to count how many distinct substrings of s satisfy all rules simultaneously.

The first thing that makes this problem difficult is the input size. The length of s and every rule string may reach 50000. A string of length 50000 has about 1.25 billion substrings if counted by position, and roughly the same order of magnitude candidate pairs (start,end). Any approach that explicitly enumerates substrings is hopeless.

The number of rules is small, at most 10. This asymmetry is the key. We have very long strings, but very few constraints.

A subtle point is that occurrences are counted with multiplicity. If a substring appears in many places inside a rule string, all of them contribute.

For example:

s = "aaa"
rule: ("aaa", 2, 2)

The substring "a" occurs three times in "aaa", not once. A solution that only checks existence would incorrectly accept it.

Another easy mistake is forgetting that we count distinct substrings of s.

s = "aaaa"
n = 0

The answer is 4, corresponding to "a", "aa", "aaa", "aaaa".

Counting occurrences by position would incorrectly return 10.

A third pitfall comes from rules with lower bound zero.

s = "ab"
1
c 0 0

Every substring of s is good because none of them occurs in "c". Any implementation that only processes substrings appearing in rule strings would miss valid answers.

The constraints strongly suggest a suffix structure. Distinct substrings correspond naturally to states of a suffix automaton or intervals inside a suffix array. We need a way to evaluate all distinct substrings without enumerating them individually.

Approaches

The brute-force idea is straightforward. Generate every distinct substring of s. For each candidate substring t, and for each rule string p_i, count how many times t occurs inside p_i. If all counts lie inside the required range, increase the answer.

This is correct because it directly implements the definition. Unfortunately, it is catastrophically slow. There are Θ(|s|²) distinct substring positions and potentially Θ(|s|²) distinct substring values. Even with efficient string matching, the total work is far beyond what is possible for length 50000.

The important observation is that every candidate substring comes from the same source string s. Instead of examining substrings one by one, we can represent all distinct substrings simultaneously inside a suffix automaton built from s.

A suffix automaton state represents an entire interval of substring lengths. Every substring corresponds to exactly one pair:

(state v, length L)
where
len(link(v)) < L ≤ len(v)

This representation collapses all distinct substrings into only O(|s|) states.

The remaining challenge is evaluating the rules. For a fixed rule string p, suppose we traverse p through the automaton of s. Every position of p determines the longest suffix ending there that appears in s. Standard suffix automaton counting techniques allow us to accumulate, for every state, how many occurrences each represented substring has inside p.

Since there are at most 10 rules, we can process each rule independently. For every state we determine the occurrence count of any substring represented by that state in the rule string. Those counts induce length intervals that satisfy the rule. Intersecting the intervals from all rules gives the set of good substring lengths represented by that state.

Finally, we sum the number of lengths remaining in every state.

The whole solution exploits two facts:

First, a suffix automaton compresses all distinct substrings of s into O(|s|) states.

Second, the number of rule strings is tiny, making it feasible to process every rule over the automaton separately.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	O(	s	² · n · max\|p\|) or worse
Optimal SAM-based Solution	O((n+1)·	s	+ Σ\|p_i\| + n·

Algorithm Walkthrough

Suffix Automaton Interpretation

For every state v in the suffix automaton:

mx = len(v)
mn = len(link(v)) + 1

The state represents all distinct substrings whose lengths belong to [mn, mx].

Each distinct substring corresponds to exactly one length inside exactly one state.

Processing One Rule

For a rule string p:

1. Traverse `p` through the automaton

For every position of p, maintain the current automaton state and the longest matched suffix length.

This is the standard SAM matching procedure.

2. Count endpoint contributions

Whenever the traversal ends at state v, increment a counter of that state.

This records that one suffix of p ends in v.

3. Propagate counts upward

Process states in decreasing order of len.

Add each state's counter to its suffix link.

After this propagation, cnt[v] equals the number of occurrences in p of every substring represented by state v.

This is the classical suffix automaton occurrence-count computation.

4. Convert counts into valid length intervals

Suppose the rule requires:

l ≤ occurrences ≤ r

For every state:

If cnt[v] satisfies the rule, then all lengths represented by that state remain possible.

Otherwise none of them can satisfy this rule.

Instead of storing a boolean, we maintain for each state an interval of lengths still valid after processing all previous rules.

Intersect the current valid interval with the state's full length interval whenever the count satisfies the rule, otherwise intersect with the empty interval.

5. Repeat for all rules

Every rule further restricts the allowed lengths in every state.

After all intersections, the remaining lengths correspond exactly to substrings satisfying every rule.

Final Counting

For each state:

allowed = max(0, right[v] - left[v] + 1)

Add this quantity to the answer.

Each surviving length corresponds to exactly one distinct substring.

Why it works

A suffix automaton partitions all distinct substrings of s into disjoint state-length pairs. Every substring belongs to exactly one state and one length value inside that state's length interval.

For a fixed rule string, the propagated occurrence count of a state equals the number of occurrences in that rule string of any substring represented by the state. This is a fundamental property of suffix automata: all substrings represented by a state have the same end-position set and therefore the same occurrence count.

Because every rule only depends on that occurrence count, either the entire state interval survives the rule or the entire interval is rejected. Intersecting the surviving length ranges across all rules leaves exactly the substrings satisfying every constraint. Summing the remaining lengths counts each distinct good substring exactly once.

Python Solution

import sys
input = sys.stdin.readline

class SAM:
    def __init__(self):
        self.next = [{}]
        self.link = [-1]
        self.length = [0]

    def extend(self, ch):
        cur = len(self.next)
        self.next.append({})
        self.link.append(0)
        self.length.append(self.length[-1] + 1)

        p = self.last
        while p != -1 and ch not in self.next[p]:
            self.next[p][ch] = cur
            p = self.link[p]

        if p == -1:
            self.link[cur] = 0
        else:
            q = self.next[p][ch]
            if self.length[p] + 1 == self.length[q]:
                self.link[cur] = q
            else:
                clone = len(self.next)
                self.next.append(self.next[q].copy())
                self.link.append(self.link[q])
                self.length.append(self.length[p] + 1)

                while p != -1 and self.next[p].get(ch) == q:
                    self.next[p][ch] = clone
                    p = self.link[p]

                self.link[q] = clone
                self.link[cur] = clone

        self.last = cur

    def build(self, s):
        self.last = 0
        for c in s:
            self.extend(c)

def solve():
    s = input().strip()
    n = int(input())

    sam = SAM()
    sam.build(s)

    sz = len(sam.next)

    maxlen = max(sam.length)
    cnt_len = [0] * (maxlen + 1)

    for x in sam.length:
        cnt_len[x] += 1

    for i in range(1, maxlen + 1):
        cnt_len[i] += cnt_len[i - 1]

    order = [0] * sz
    for v in range(sz - 1, -1, -1):
        l = sam.length[v]
        cnt_len[l] -= 1
        order[cnt_len[l]] = v

    left = [sam.length[sam.link[v]] + 1 if v else 1
            for v in range(sz)]
    right = sam.length[:]

    for _ in range(n):
        p, L, R = input().split()
        L = int(L)
        R = int(R)

        occ = [0] * sz

        v = 0
        cur_len = 0

        for ch in p:
            while v and ch not in sam.next[v]:
                v = sam.link[v]
                cur_len = sam.length[v]

            if ch in sam.next[v]:
                v = sam.next[v][ch]
                cur_len += 1
            else:
                v = 0
                cur_len = 0

            occ[v] += 1

        for state in range(sz - 1, 0, -1):
            x = order[state]
            parent = sam.link[x]
            if parent >= 0:
                occ[parent] += occ[x]

        for state in range(sz):
            if not (L <= occ[state] <= R):
                left[state] = 1
                right[state] = 0

    ans = 0
    for v in range(1, sz):
        if left[v] <= right[v]:
            ans += right[v] - left[v] + 1

    print(ans)

if __name__ == "__main__":
    solve()

The implementation builds a suffix automaton of the source string once. The states are sorted by length so that suffix-link propagation can be performed efficiently.

For each rule string we run the standard automaton matching procedure. Every position contributes one endpoint count. Processing states in decreasing length order propagates these counts through suffix links and converts endpoint frequencies into substring occurrence frequencies.

The critical observation is that all substrings represented by a state share the same occurrence count in the processed rule string. Because of that, a state either survives a rule completely or is eliminated completely.

The root state is never counted in the final answer because it represents the empty string rather than a non-empty substring.

Worked Examples

Sample 1

Input:

aaab
2
aa 0 0
aab 1 1

The distinct substrings are:

a
aa
aaa
aaab
aab
ab
b

Rule 1 counts occurrences in "aa".

Substring	Occurrences in "aa"	Pass?
a	2	No
aa	1	No
aaa	0	Yes
aaab	0	Yes
aab	0	Yes
ab	0	Yes
b	0	Yes

Rule 2 counts occurrences in "aab".

Substring	Occurrences in "aab"	Pass?
aaa	0	No
aaab	0	No
aab	1	Yes
ab	1	Yes
b	1	Yes

The surviving substrings are:

aab
ab
b

Answer:

This example demonstrates how a substring may pass one rule and fail another. The algorithm performs exactly the same filtering through interval intersections inside SAM states.

Example 2

Input:

ab
1
c 0 0

Substring	Occurrences in "c"	Good?
a	0	Yes
b	0	Yes
ab	0	Yes

Answer:

This example exercises the lower-bound-zero case. Every substring absent from the rule string remains valid.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(	s
Space	O(	SAM

A suffix automaton contains at most 2|s| - 1 states. With |s| ≤ 50000, this means roughly 100000 states. Since there are at most 10 rules, traversing all states for each rule easily fits within the limits.

Test Cases

import sys, io

def run(inp: str) -> str:
    # call solution here
    return ""

# provided sample
assert run(
"""aaab
2
aa 0 0
aab 1 1
""") == "3\n"

# no rules, distinct substrings of "a"
assert run(
"""a
0
""") == "1\n"

# all substrings accepted
assert run(
"""ab
1
c 0 0
""") == "3\n"

# repeated characters
assert run(
"""aaaa
0
""") == "4\n"

# exact occurrence constraint
assert run(
"""aaa
1
aaa 2 2
""") == "1\n"

Test input	Expected output	What it validates
`a, n=0`	`1`	Minimum size
`ab` with rule `c 0 0`	`3`	Zero-occurrence acceptance
`aaaa, n=0`	`4`	Distinct substring counting
`aaa` with rule `aaa 2 2`	`1`	Exact occurrence filtering

Edge Cases

Consider:

a
0

The suffix automaton contains one non-root state representing length interval [1,1]. The final summation contributes exactly one substring. The answer is 1.

Consider:

aaaa
0

There are ten positional substrings but only four distinct values. The SAM representation stores lengths 1,2,3,4 exactly once. Summing interval sizes produces 4, avoiding duplicate counting automatically.

Consider:

ab
1
c 0 0

The traversal of "c" never leaves the root. Every non-empty substring has occurrence count zero in the rule string. Since zero lies inside the allowed range, all represented lengths survive. The answer becomes 3.

Consider:

aaa
1
aaa 2 2

Occurrences are counted with multiplicity. "a" occurs three times, "aa" occurs twice, "aaa" occurs once. Only "aa" satisfies the constraint. The suffix automaton propagation computes these exact frequencies through end-position aggregation, giving the correct answer 1.

CF 316G2 - Good Substrings

Solution

Problem Understanding

Approaches

Algorithm Walkthrough

Suffix Automaton Interpretation

Processing One Rule

1. Traverse p through the automaton

2. Count endpoint contributions

3. Propagate counts upward

4. Convert counts into valid length intervals

5. Repeat for all rules

Final Counting

Why it works

Python Solution

Worked Examples

Sample 1

Example 2

Complexity Analysis

Test Cases

Edge Cases

1. Traverse `p` through the automaton