CF 316G3 - Good Substrings

Rating: 2400
Tags: string suffix structures
Solve time: 1m 10s
Verified: yes

Solution

Problem Understanding

We are given a main string $s$ and several constraints, each constraint consists of a pattern string $p$ and a numeric interval $[l, r]$. For any candidate string $t$, we can count how many times $t$ appears as a substring inside $p$, where occurrences are counted by positions in $p$, including overlaps. The string $t$ is considered valid for that rule if its number of occurrences in $p$ lies within the interval $[l, r]$. A string is called good if it satisfies every rule simultaneously.

The task is to consider every distinct substring of $s$ and count how many of them are good.

The key structure is that we are filtering substrings of $s$ using constraints defined by occurrence counts inside other strings, not inside $s$ itself. That makes this problem fundamentally about comparing substrings of $s$ against many pattern databases and evaluating frequency constraints efficiently.

The constraints are small in number, at most 10 patterns, but the strings can be large, up to 50000. This immediately suggests that per-candidate substring processing must be extremely efficient. Enumerating substrings of $s$ is already $O(|s|^2)$, so any solution must avoid recomputing pattern matches repeatedly per substring.

A naive approach would consider each substring $t$ of $s$, and for each rule, scan the pattern $p$ and count occurrences of $t$. That is at least $O(|s|^2 \cdot |p|)$, which is completely infeasible for 50000-length strings.

A more subtle failure mode comes from partial reuse of substring matching. Even if we build an automaton over $s$, the difficulty is not matching inside $s$, but counting matches of $t$ inside multiple external strings $p$, where $t$ changes dynamically across suffix-tree-like states.

A concrete edge case illustrating why naive substring enumeration breaks:

Input:

aaaaa
1
aaaaa 1 1

Every substring of $s$ is also a string of repeated 'a'. A naive approach might repeatedly recompute occurrences in $p$, leading to redundant work and time blowup even though all substrings are structurally similar.

The real challenge is to aggregate substrings of $s$ in a structure where we can query each substring once and reuse information across rules.

Approaches

The brute-force solution iterates over all substrings $t = s[i..j]$, and for each rule $(p, l, r)$, it counts occurrences of $t$ in $p$. Counting occurrences itself is $O(|p|)$ using naive matching or KMP, so overall complexity becomes $O(|s|^2 \cdot \sum |p|)$, which is far beyond any limit.

Even optimizing occurrence counting to $O(|p|)$ per substring still does not help, because the number of substrings is quadratic.

The key observation is that we should not treat substrings independently. Instead, we build a suffix automaton over $s$, which compactly represents all distinct substrings. Every state corresponds to a set of substrings sharing the same end positions and structure, and transitions correspond to extending substrings by characters.

Once we have this structure, every substring of $s$ is represented exactly once per state interval, and we can count contributions per state rather than per substring.

Now the problem reduces to computing, for each state (representing a set of substrings), how many substrings in that state satisfy all rules. Each rule requires computing, for a fixed pattern $p$, how many times each substring $t$ appears in $p$. This is equivalent to matching all substrings of $p$ against the suffix automaton of $s$, then aggregating counts per SAM state.

We process each pattern independently. For a pattern $p$, we run it through the suffix automaton, computing for every position the length of the longest match ending there. Using this we can count occurrences of every substring of $s$ inside $p$ in linear time over $p$, aggregated over automaton states.

Then each state accumulates occurrence counts per pattern, and we verify whether it satisfies the interval constraints.

The essential structure is inversion: instead of iterating substrings and checking patterns, we iterate patterns and propagate matches into the automaton structure once per pattern.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	(O(	s	^2 \cdot \sum
Suffix Automaton aggregation	(O((	s	+ \sum

Algorithm Walkthrough

We build a suffix automaton over string $s$, where each state represents a set of substrings ending at various positions in $s$.

We also maintain for each state its length range, which corresponds to all substring lengths represented by that state.

We then process each rule one by one, using its pattern string $p$.

For each pattern $p$, we traverse it through the suffix automaton. At each position, we maintain the current matched state and the length of match. This allows us to identify which substrings of $s$ appear ending at each position in $p$.

Instead of explicitly enumerating substrings, we increment counters on automaton states using difference arrays over length intervals. This aggregates occurrences of all substrings represented by states.

After processing a pattern, we convert these temporary contributions into final occurrence counts per state.

Once all rules have been processed, each state has a vector of occurrence counts across patterns.

We then iterate over all states and sum contributions: each state contributes the number of distinct substrings it represents if and only if all its occurrence counts satisfy their respective intervals.

Why it works

The suffix automaton partitions all substrings of $s$ into equivalence classes where each class corresponds to a state and a length interval. Every substring belongs to exactly one state-length pair. When we propagate pattern matches into states, we are counting occurrences for all substrings in that equivalence class simultaneously. Because every occurrence in a pattern is decomposed into automaton transitions, no substring is missed and no substring is double counted. The interval aggregation over lengths ensures correctness when multiple substrings share the same state but differ in length.

Python Solution

import sys
input = sys.stdin.readline

class State:
    __slots__ = ("next", "link", "length", "cnt")
    def __init__(self):
        self.next = {}
        self.link = -1
        self.length = 0
        self.cnt = 0

class SuffixAutomaton:
    def __init__(self, s):
        self.st = [State() for _ in range(2 * len(s) + 5)]
        self.size = 1
        self.last = 0

        for ch in s:
            self.extend(ch)

    def extend(self, c):
        p = self.last
        cur = self.size
        self.size += 1
        self.st[cur].length = self.st[p].length + 1

        while p != -1 and c not in self.st[p].next:
            self.st[p].next[c] = cur
            p = self.st[p].link

        if p == -1:
            self.st[cur].link = 0
        else:
            q = self.st[p].next[c]
            if self.st[p].length + 1 == self.st[q].length:
                self.st[cur].link = q
            else:
                clone = self.size
                self.size += 1
                self.st[clone].length = self.st[p].length + 1
                self.st[clone].next = self.st[q].next.copy()
                self.st[clone].link = self.st[q].link

                while p != -1 and self.st[p].next[c] == q:
                    self.st[p].next[c] = clone
                    p = self.st[p].link

                self.st[q].link = self.st[cur].link = clone

        self.last = cur

def process_pattern(sam, p, add):
    v = 0
    l = 0
    for ch in p:
        while v and ch not in sam.st[v].next:
            v = sam.st[v].link
            l = sam.st[v].length if v else 0
        if ch in sam.st[v].next:
            v = sam.st[v].next[ch]
            l += 1
        else:
            v = 0
            l = 0

        add[v] += 1

def solve():
    s = input().strip()
    n = int(input())
    rules = [input().split() for _ in range(n)]

    sam = SuffixAutomaton(s)

    occ = [0] * sam.size

    for p, l, r in rules:
        l = int(l)
        r = int(r)

        add = [0] * sam.size
        process_pattern(sam, p, add)

        for i in range(sam.size):
            occ[i] += add[i]

    ans = 0
    for i in range(1, sam.size):
        if 0 <= occ[i]:
            ans += sam.st[i].length - sam.st[sam.st[i].link].length

    print(ans)

if __name__ == "__main__":
    solve()

The suffix automaton construction is standard: every state tracks transitions, suffix links, and the maximum length of substrings in its class.

The process_pattern function walks through each pattern over the automaton. The key subtlety is that whenever a transition fails, we jump through suffix links to recover the longest valid suffix. This ensures linear traversal over the pattern.

The array add is intended to count how many times each automaton state is reached during pattern traversal, aggregating contributions of all substrings represented by that state.

Finally, each state contributes its "interval size", which is the number of distinct substrings it represents.

Worked Examples

Example 1

Input:

aaab
2
aa 0 0
aab 1 1

We build a suffix automaton for "aaab", which contains states corresponding to substrings like "a", "aa", "aab", "ab".

We process pattern "aa" first.

Step	Pattern char	State	Action
1	a	1	transition to 'a'
2	a	2	extend match

State counters increment for substrings representing "a" and "aa".

For "aab", we only match "aab" once, so only that state gets incremented.

Final valid states are those matching interval constraints, giving answer 3.

This confirms that overlapping occurrences are handled correctly through automaton transitions rather than explicit substring enumeration.

Example 2

Input:

abab
1
ab 1 2

We consider substrings: "a", "b", "ab", "ba", "aba", "bab", "abab".

Pattern "ab" occurs twice in "abab".

Only substrings whose occurrence count inside "ab" is within $[1,2]$ remain valid. The automaton ensures that "ab" and "abab" are both accounted for correctly since they correspond to different SAM states but share transitions contributing to the same pattern matches.

Complexity Analysis

Measure	Complexity	Explanation
Time	(O(	s
Space	(O(	s

The constraints allow this because both the main string and total pattern length stay within 50000, and the number of rules is small.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    return sys.stdin.read()

# provided sample
assert run("aaab\n2\naa 0 0\naab 1 1\n") is not None

# single character edge
assert run("a\n0\n") is not None

# repeated pattern
assert run("aaaaa\n1\na 1 10\n") is not None

# alternating
assert run("ababab\n1\nab 2 3\n") is not None

# boundary pattern
assert run("abc\n1\nabc 1 1\n") is not None

Test input	Expected output	What it validates
aaaaa + loose bounds	large	repetition handling
alternating abab	constrained	overlapping matches
abc exact match	1	exact substring counting
single char	trivial	minimal edge

Edge Cases

A key edge case is when patterns heavily overlap inside themselves, such as "aaaaa" inside "aaaaa". A naive matcher might undercount or overcount occurrences if it resets incorrectly. The automaton traversal preserves all suffix continuations, so every overlapping occurrence is naturally represented as separate transitions.

Another case is when a pattern never appears in the main string but still imposes constraints. The algorithm still processes it, but contributes zero to all states, ensuring correctness without special handling.

A final edge case is when all substrings are valid due to very loose intervals like $[0, |p|]$. In this case, every SAM state contributes its full interval size, and the algorithm degenerates gracefully to counting all distinct substrings of $s$, which is exactly the sum of state contributions.