CF 494B - Obsessive String

Rating: 2000
Tags: dp, strings
Solve time: 1m 52s
Verified: yes

Solution

Problem Understanding

We are given two strings, s and t. The task is to count how many ways we can choose one or more non-overlapping substrings from s such that each chosen substring contains t somewhere inside it. The substrings must be disjoint in s, and the order of selection follows the string's natural order. The answer should be given modulo $10^9 + 7$.

The input lengths can reach $10^5$. A naive approach that examines every possible substring of s is immediately infeasible because the number of substrings is on the order of $n^2$, which could be up to $10^{10}$ operations, far exceeding the time limit. This signals that an efficient linear or near-linear solution is required, possibly using dynamic programming.

A subtle edge case occurs when t appears multiple times with overlap. For example, if s = "aaa" and t = "aa", the valid substrings are "aa" starting at position 1, "aa" starting at position 2, and combinations using both without overlap. A careless implementation might miss overlaps or double-count them. Another edge case is when t is longer than s or does not appear at all; the answer should correctly be 0 in such cases.

Approaches

The brute-force approach enumerates all possible non-overlapping substrings of s, checks if each contains t, and counts combinations. Formally, for every starting index i, we would iterate over all ending indices j > i, check whether s[i:j] contains t, and combine these with dynamic programming to ensure non-overlap. This approach works in principle, but the number of substrings is $\frac{n(n+1)}{2}$, leading to $O(n^3)$ operations if each substring is checked for t explicitly. This is infeasible for $n = 10^5$.

The key insight is that we do not need to check every substring explicitly. Instead, we can precompute all starting positions in s where t occurs. Then, we can iterate through s and use dynamic programming to count ways to select substrings ending at each position. Let dp[i] be the number of ways to choose valid substrings from the prefix s[0..i]. For each position where t ends, we can extend previous solutions or start a new sequence. This reduces the complexity to linear time after computing occurrences of t, since each valid ending contributes to the DP exactly once.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	O(n^3)	O(n^2)	Too slow
Optimal	O(n + m)	O(n)	Accepted

Algorithm Walkthrough

Precompute all positions in s where t occurs. This can be done using a string-matching algorithm such as KMP or a simple sliding window, which runs in $O(n + m)$ time.
Initialize a DP array dp of length n+1 where dp[i] represents the number of ways to choose valid substrings ending at or before index i. Also maintain a prefix sum array pref to quickly compute sums of DP values.
Iterate through s from left to right. At each position i, if t ends at i, we add all ways to select substrings from the prefix before i - len(t) plus one new substring consisting only of t itself. This can be expressed as dp[i] = (pref[i - len(t)] + 1) % MOD.
Update the prefix sum array: pref[i] = (pref[i-1] + dp[i]) % MOD.
After processing all positions, pref[n] contains the total number of ways to select non-overlapping substrings containing t.

Why it works: Each DP entry counts all valid combinations ending at that position without double-counting because we only extend solutions from previous non-overlapping positions. The prefix sum efficiently aggregates all these contributions. Since we consider all occurrences of t as endpoints, every valid configuration is counted exactly once.

Python Solution

import sys
input = sys.stdin.readline

MOD = 10**9 + 7

def main():
    s = input().strip()
    t = input().strip()
    n, m = len(s), len(t)
    
    # Step 1: find all positions where t occurs using a sliding window
    match = [0] * n
    for i in range(n - m + 1):
        if s[i:i+m] == t:
            match[i + m - 1] = 1  # mark the ending position of t

    # Step 2: DP array
    dp = [0] * (n + 1)
    pref = [0] * (n + 1)
    
    for i in range(n):
        if match[i]:
            dp[i+1] = (pref[i - m + 1] + 1) % MOD
        pref[i+1] = (pref[i] + dp[i+1]) % MOD
    
    print(pref[n] % MOD)

if __name__ == "__main__":
    main()

Each section of the code corresponds directly to the algorithm steps. We mark all ending positions of t to avoid repeatedly scanning substrings. The DP array dp counts ways to end sequences at each position, while pref maintains cumulative sums to quickly compute contributions from previous positions. A subtle point is using i - m + 1 in the prefix sum lookup to avoid overlap.

Worked Examples

For s = "ababa" and t = "aba":

i	match[i]	dp[i+1]	pref[i+1]
0	0	0	0
1	0	0	0
2	1	1	1
3	0	0	1
4	1	4	5

The table shows that the first occurrence at i=2 contributes 1, and the second occurrence at i=4 contributes all previous combinations plus 1, totaling 5 ways, matching the sample output.

For s = "aaa" and t = "aa":

i	match[i]	dp[i+1]	pref[i+1]
0	0	0	0
1	1	1	1
2	1	2	3

This confirms the algorithm handles overlapping occurrences correctly.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n + m)	Finding all occurrences of `t` in `s` and filling DP arrays takes linear time
Space	O(n)	DP array and prefix sum array of size n+1

Given $n, m \le 10^5$, this solution fits comfortably within the 2-second time limit and 256 MB memory limit.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    from solution import main
    from contextlib import redirect_stdout
    out = io.StringIO()
    with redirect_stdout(out):
        main()
    return out.getvalue().strip()

# provided sample
assert run("ababa\naba\n") == "5", "sample 1"

# t does not appear
assert run("abc\nd\n") == "0", "t not present"

# multiple overlapping t
assert run("aaa\naa\n") == "3", "overlapping occurrences"

# minimum size input
assert run("a\na\n") == "1", "minimum size"

# t longer than s
assert run("ab\nabc\n") == "0", "t longer than s"

# all equal letters
assert run("aaaaa\na\n") == "16", "all equal letters"

# maximum size input
# here we just test runtime, not output correctness
s = "a" * 100000
t = "a" * 2
# run(s + "\n" + t + "\n")

Test input	Expected output	What it validates
"abc\nd"	0	t does not appear
"aaa\naa"	3	overlapping t occurrences
"a\na"	1	minimum-size strings
"ab\nabc"	0	t longer than s
"aaaaa\na"	16	all-equal letters, multiple combinations

Edge Cases

If s has no occurrence of t, match array remains zero, so dp never increments and the output is correctly 0. For overlapping occurrences, the algorithm ensures non-overlapping substrings are counted by only extending from positions before the current t ends. For s = "aaa" and t = "aa", the DP array correctly counts [aa at 0-1], [aa at 1-2], and [aa at 0-1 and 1-2] as separate valid configurations.