CF 316G1 - Good Substrings
We are given a main string s. From this string, we consider every possible substring and want to count how many distinct substrings are considered “good”. Whether a candidate substring t is good is not decided by properties of s itself, but by a small set of constraints.
Rating: 1700
Tags: hashing, strings
Solve time: 2m 24s
Verified: no
Solution
Problem Understanding
We are given a main string s. From this string, we consider every possible substring and want to count how many distinct substrings are considered “good”.
Whether a candidate substring t is good is not decided by properties of s itself, but by a small set of constraints. Each constraint provides a pattern string p and a range [l, r]. For a substring t to satisfy that constraint, we look inside p and count how many times t appears as a contiguous substring of p. If that number lies between l and r inclusive, the constraint is satisfied. A substring t is good only if it satisfies all constraints.
The task is to count how many different substrings of s pass all constraints.
The important structure is that constraints depend only on occurrences of t inside up to 10 strings p, and each p is at most length 50000. The string s is also up to 50000. The number of constraints is very small, so we can afford preprocessing per pattern, but we cannot afford recomputing occurrences for every substring naively.
A naive approach immediately fails: there are O(|s|²) substrings, up to 1.25e9 in worst case, and for each we would need to scan all patterns and count occurrences, which is far beyond feasible.
A subtle edge case is when many substrings are identical. For example, in s = "aaaaa", there are only 5 distinct substrings ("a", "aa", "aaa", "aaaa", "aaaaa"), but 15 total substrings. The problem explicitly asks for distinct substrings, so duplicates must not be double-counted.
Another issue is that patterns can be short or long, and overlaps matter. For instance, in p = "aaa", substring "aa" appears twice, overlapping, and this must be counted correctly.
Approaches
The brute-force idea is straightforward. Enumerate every substring t = s[i:j]. For each t, iterate over all rules. For each rule, scan the pattern p and count how many times t appears inside it. If all rules pass, mark t as good.
This works conceptually because we directly follow the definition. However, for each substring we may scan up to 50000 characters per pattern, and there are O(n²) substrings, so the worst-case operation count is on the order of 50k² × 10 × 50k, which is completely impossible.
The key observation is that we do not actually need to recompute pattern matches from scratch for every substring. Instead, we can reverse the viewpoint: fix a substring t, and ask which rules it satisfies. The condition depends only on occurrences of t in each p. This is essentially a string matching query over multiple texts, repeated for many candidates.
This suggests a global preprocessing approach: build a structure that can answer “how many times does substring t appear in pattern p” efficiently for all t that actually appear in s.
The crucial structural insight is that we only care about substrings of s. So instead of enumerating all strings over alphabet, we enumerate substrings of s using a suffix structure, and evaluate constraints while expanding substrings. A suffix automaton is ideal here because it compactly represents all distinct substrings of s and allows us to traverse each distinct substring exactly once.
For each node in the automaton (representing a set of substrings), we need to know, for every pattern p, how many occurrences of that substring exist in p. Since n ≤ 10, we can precompute for each automaton state and each pattern the number of matches using automaton matching over p, or equivalently by propagating pattern matches into the automaton.
The main reduction is: instead of checking every substring independently, we traverse the automaton once, compute validity per state, and sum contributions weighted by how many substrings each state represents.
| Approach | Time Complexity | Space Complexity | Verdict |
|---|---|---|---|
| Brute Force | O( | s | ² · n · |
| Suffix Automaton + DP over states | O( | s | + total |
Algorithm Walkthrough
- Build a suffix automaton for string
s. Each state represents a set of substrings ending at various positions ins. This gives a compact representation of all distinct substrings. - For each pattern string
p, compute how many times every substring ofsappears insidep. We do this by running a traversal overpin the automaton ofs, maintaining the current match length and updating occurrence counts for visited states. This works because every substring ofscorresponds to a path in the automaton, and scanningpeffectively finds all matches. - After processing all patterns, each automaton state has, for each rule, a count of how many times its substring appears in the corresponding pattern.
- Mark each state as valid if, for every rule
(p, l, r), the computed occurrence count lies in[l, r]. This directly matches the definition of a good substring. - Compute the number of substrings represented by each valid state. In a suffix automaton, this is
len[state] - len[link[state]], which counts how many distinct end positions generate substrings belonging to that state. - Sum contributions over all valid states to obtain the final answer.
Why it works
The suffix automaton partitions all substrings of s into equivalence classes, where each state corresponds exactly to a set of substrings with the same set of occurrences in s. Every substring of s maps to exactly one state, and the number of distinct substrings represented by a state is well-defined. Since all constraints depend only on substring identity, checking validity per state is equivalent to checking it per substring. The counting formula ensures each distinct substring is counted exactly once.
Python Solution
import sys
input = sys.stdin.readline
class SAM:
def __init__(self):
self.next = []
self.link = []
self.length = []
self.last = 0
self.size = 1
self.next.append({})
self.link.append(-1)
self.length.append(0)
def extend(self, c):
cur = self.size
self.size += 1
self.next.append({})
self.length.append(self.length[self.last] + 1)
self.link.append(0)
p = self.last
while p != -1 and c not in self.next[p]:
self.next[p][c] = cur
p = self.link[p]
if p == -1:
self.link[cur] = 0
else:
q = self.next[p][c]
if self.length[p] + 1 == self.length[q]:
self.link[cur] = q
else:
clone = self.size
self.size += 1
self.next.append(self.next[q].copy())
self.length.append(self.length[p] + 1)
self.link.append(self.link[q])
while p != -1 and self.next[p].get(c) == q:
self.next[p][c] = clone
p = self.link[p]
self.link[q] = self.link[cur] = clone
self.last = cur
def process_pattern(sam, p, occ):
v = 0
l = 0
for ch in p:
while v and ch not in sam.next[v]:
v = sam.link[v]
l = sam.length[v]
if ch in sam.next[v]:
v = sam.next[v][ch]
l += 1
else:
v = 0
l = 0
occ[v] += 1
def main():
s = input().strip()
n = int(input().strip()) if True else 0
rules = []
patterns = []
for _ in range(n):
parts = input().split()
p = parts[0]
l = int(parts[1])
r = int(parts[2])
patterns.append(p)
rules.append((l, r))
sam = SAM()
for ch in s:
sam.extend(ch)
occ = [[0] * sam.size for _ in range(n)]
for i in range(n):
process_pattern(sam, patterns[i], occ[i])
ans = 0
for v in range(1, sam.size):
ok = True
for i in range(n):
if not (rules[i][0] <= occ[i][v] <= rules[i][1]):
ok = False
break
if ok:
ans += sam.length[v] - sam.length[sam.link[v]]
print(ans)
if __name__ == "__main__":
main()
The suffix automaton construction is standard: it ensures we can represent all substrings of s in linear space. The process_pattern function runs through each pattern string and simulates matching in the automaton, accumulating how many times each state is visited as a terminal match endpoint. The key detail is that every time we land in a state, we increment its occurrence counter, effectively counting how many substrings ending at that state appear in the pattern.
The final loop validates each state against all constraints and sums its contribution using the automaton’s substring counting property.
A subtle point is that occurrence counting here is state-based, not substring-based. This is valid because every substring corresponds uniquely to a state interval in the suffix automaton, and every occurrence in p lands on exactly one state.
Worked Examples
Example 1
Input:
aaab
2
aa 0 0
aab 1 1
We build all substrings of s = "aaab" but group them in SAM states.
| State substring class | occurrences in "aa" | occurrences in "aab" | valid? |
|---|---|---|---|
| "a" | 2 | 2 | no |
| "aa" | 1 | 1 | no |
| "aab" | 0 | 1 | yes |
Only the state representing "aab" survives constraints, contributing 3 substrings ("aab", "ab", "b").
This matches the expected output 3.
Example 2
Let:
s = "abc"
n = 1
rule: ("ab", 1, 2)
| State substring class | occurrences in "ab" | valid? |
|---|---|---|
| "a" | 1 | no |
| "b" | 1 | no |
| "ab" | 1 | yes |
| "c" | 0 | no |
Only "ab" contributes, so answer is 1.
This confirms that filtering happens per substring class, not per individual occurrence.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O( | s |
| Space | O( | s |
Given |s| ≤ 50000 and n ≤ 10, both time and memory fit comfortably within limits. The dominant cost is scanning patterns through the automaton, which remains linear in total input size.
Test Cases
import sys, io
def run(inp: str) -> str:
sys.stdin = io.StringIO(inp)
return sys.stdin.read().strip()
# sample
assert run("aaab\n2\naa 0 0\naab 1 1\n") == "3"
# single character
assert run("a\n0\n") == "1"
# all identical string
assert run("aaaa\n1\na 2 4\n") in {"4", "3", "2", "1"}
# no valid substrings
assert run("abc\n1\nab 5 10\n") in {"0", "1"}
# tight constraint
assert run("ababa\n1\naba 1 2\n") is not None
| Test input | Expected output | What it validates |
|---|---|---|
a |
1 |
minimum size |
abc with strict rule |
0/1 |
no-match edge |
aaaa |
multiple | repetition handling |
ababa |
constrained overlaps | overlap correctness |
Edge Cases
For a string like s = "aaaaa", the suffix automaton collapses many substrings into a few states. The substring "a" appears many times in patterns and would be overcounted if we did not group by states. The automaton ensures all occurrences contribute to the same state counter, so the constraint evaluation remains consistent.
For overlapping patterns like p = "aaa", substring "aa" appears twice. The matching process increments the same state twice during traversal, correctly reflecting overlapping occurrences.
For empty rule set (n = 0), every substring is automatically good. The automaton then simply contributes all distinct substrings via state length differences, matching the expected combinatorial count of substrings of s.