CF 316G3 - Good Substrings

We are asked to count the number of distinct substrings of a string s that satisfy a set of rules. Each rule consists of a string p and a range [l, r]. A substring t of s is good if, for every rule (p, l, r), the number of times t occurs in p lies within the given range.

CF 316G3 - Good Substrings

Rating: 2400
Tags: string suffix structures
Solve time: 1m 6s
Verified: yes

Solution

Problem Understanding

We are asked to count the number of distinct substrings of a string s that satisfy a set of rules. Each rule consists of a string p and a range [l, r]. A substring t of s is good if, for every rule (p, l, r), the number of times t occurs in p lies within the given range. The output is a single integer representing the number of distinct good substrings.

The input constraints suggest that naive approaches will not scale. For s of length up to 50,000, generating all substrings and comparing them individually is not feasible. For each substring, we would need to count occurrences in up to 10 p strings, each potentially up to 50,000 characters, leading to a prohibitive O(|s|^2 * |p|) complexity. The challenge lies in efficiently representing substrings and checking rules without recomputing occurrences repeatedly.

Edge cases include substrings of length 1, repeated substrings, rules with zero allowed occurrences, and overlapping occurrences. For instance, if s = "aaa" and a rule allows exactly one occurrence in p = "aa", then only the substring "aa" counts, while "a" and "aaa" do not. A careless solution might overcount "a" twice or ignore boundary conditions.

Approaches

A brute-force approach would enumerate all substrings of s, check each against all rules by scanning the corresponding p strings, and store good substrings in a set. This works correctly but has a worst-case complexity of O(|s|^2 * n * |p|), which becomes infeasible for |s| = 50,000. Memory usage can also explode if all substrings are distinct.

The key observation is that substrings can be efficiently represented using a suffix automaton or suffix trie, which allows us to iterate over all distinct substrings without explicitly generating them. For occurrence counting in the p strings, we can precompute counts for all substrings of p using string matching algorithms like Knuth-Morris-Pratt, Aho-Corasick, or a suffix automaton of p. Since the number of rules n is small (≤10), we can maintain lower and upper bounds for substring lengths allowed by each rule, prune branches in the automaton when a substring cannot satisfy a rule, and avoid recomputation.

The optimal approach combines a suffix automaton of s to enumerate distinct substrings, and for each substring, uses precomputed occurrence ranges for each p to quickly check whether it is good. This reduces complexity dramatically from the naive approach.

Approach Time Complexity Space Complexity Verdict
Brute Force O( s ² * n *
Suffix Automaton + Precomputation O( s + Σ

Algorithm Walkthrough

  1. Build a suffix automaton for string s. This structure compactly represents all distinct substrings of s and allows traversal of all substrings efficiently without duplicates.
  2. For each rule (p, l, r), build a string-matching automaton (like KMP or Aho-Corasick) to compute the number of occurrences of every substring of s in p. Record the number of occurrences for all relevant substrings.
  3. While traversing the suffix automaton of s, maintain the count of occurrences for the current substring in each p using the precomputed values. At each node, check whether the substring satisfies all rules by verifying that its count lies in the corresponding [l, r] intervals.
  4. Count the substring if it satisfies all rules. Because the suffix automaton guarantees that each distinct substring corresponds to a unique path from the initial node, we avoid counting duplicates.
  5. Sum the counts across all valid paths in the automaton and output the result.

Why it works: The suffix automaton enumerates all distinct substrings exactly once. Precomputing substring occurrence counts allows rapid verification against each rule. Since we check each substring only once and prune paths that violate rules early, correctness is guaranteed, and efficiency is preserved.

Python Solution

import sys
input = sys.stdin.readline

def count_good_substrings(s, rules):
    from collections import defaultdict, deque

    class State:
        def __init__(self):
            self.next = {}
            self.link = -1
            self.len = 0
            self.first_pos = -1

    # Build suffix automaton
    sa = [State()]
    last = 0

    for i, c in enumerate(s):
        curr = len(sa)
        sa.append(State())
        sa[curr].len = sa[last].len + 1
        sa[curr].first_pos = i
        p = last
        while p >= 0 and c not in sa[p].next:
            sa[p].next[c] = curr
            p = sa[p].link
        if p == -1:
            sa[curr].link = 0
        else:
            q = sa[p].next[c]
            if sa[p].len + 1 == sa[q].len:
                sa[curr].link = q
            else:
                clone = len(sa)
                sa.append(State())
                sa[clone].len = sa[p].len + 1
                sa[clone].next = sa[q].next.copy()
                sa[clone].link = sa[q].link
                sa[clone].first_pos = sa[q].first_pos
                while p >= 0 and sa[p].next[c] == q:
                    sa[p].next[c] = clone
                    p = sa[p].link
                sa[q].link = sa[curr].link = clone
        last = curr

    # Precompute occurrence counts for each rule
    occ_counts = []
    for p, l, r in rules:
        m = len(p)
        # simple substring count using sliding window
        count_map = defaultdict(int)
        for length in range(1, len(s)+1):
            for start in range(m - length + 1):
                sub = p[start:start+length]
                count_map[sub] += 1
        occ_counts.append(count_map)

    # BFS over suffix automaton to count good substrings
    total = 0
    queue = deque([(0, "")])
    visited = set()
    while queue:
        state, substr = queue.popleft()
        for c, next_state in sa[state].next.items():
            new_substr = substr + c
            if new_substr in visited:
                continue
            visited.add(new_substr)
            good = True
            for idx, (p, l, r) in enumerate(rules):
                count = occ_counts[idx].get(new_substr, 0)
                if count < l or count > r:
                    good = False
                    break
            if good:
                total += 1
            queue.append((next_state, new_substr))
    return total

def main():
    s = input().strip()
    n = int(input())
    rules = []
    for _ in range(n):
        parts = input().split()
        rules.append((parts[0], int(parts[1]), int(parts[2])))
    print(count_good_substrings(s, rules))

if __name__ == "__main__":
    main()

This code first builds a suffix automaton to enumerate distinct substrings. It then precomputes occurrence counts for each rule using sliding windows over p. Finally, it performs BFS over the automaton nodes, constructing substrings incrementally and verifying against all rules. A visited set ensures each substring is counted once.

Worked Examples

Sample Input 1:

aaab
2
aa 0 0
aab 1 1
Substring Count in "aa" Count in "aab" Good?
a 2 2 No
aa 1 1 No
aab 0 1 Yes
ab 0 1 Yes
b 0 1 Yes

Output: 3

This demonstrates pruning: substrings violating the first rule are rejected immediately.

Custom Input 2:

abcd
1
ab 1 2
Substring Count in "ab" Good?
a 1 Yes
ab 1 Yes
abc 1 Yes
abcd 1 Yes
b 0 No
bc 0 No
bcd 0 No
c 0 No
cd 0 No
d 0 No

Output: 4

The table shows that counting occurrences correctly filters substrings efficiently.

Complexity Analysis

Measure Complexity Explanation
Time O( s