CF 39J - Spelling Check

Rating: 1500
Tags: hashing, implementation, strings
Solve time: 1m 34s
Verified: yes

Solution

Problem Understanding

We are given two lowercase strings. The first string is exactly one character longer than the second. We want to find every position in the first string such that removing the character at that position makes the two strings identical.

The task is not to perform arbitrary edits. Only one deletion is allowed, and it must happen in the longer string. The answer consists of all valid 1-based positions where this deletion works.

The constraints are the part that changes the nature of the problem completely. Each string can contain up to $10^6$ characters. A quadratic solution is impossible here. Even an $O(n^2)$ algorithm with small constants would require around $10^{12}$ operations in the worst case, which is far beyond the time limit. A linear or near-linear approach is required.

A direct simulation for every deletion position is dangerous because copying or rebuilding strings repeatedly becomes extremely expensive at this scale. Even creating $n$ temporary strings of length $n$ would use huge amounts of memory and time.

Several edge cases are easy to mishandle.

Consider the case where every deletion works:

Input:

aaaa
aaa

Output:

4
1 2 3 4

Removing any character produces "aaa". A careless mismatch-based solution might stop after finding the first valid position and miss the others.

Another tricky case appears when the mismatch happens near the end:

Input:

abcd
abc

Output:

1
4

The first three characters already match. The only valid deletion is the last character. Algorithms that only search for the first mismatch may incorrectly conclude that no deletion is needed.

Repeated characters also create ambiguity:

Input:

abcc
abc

Output:

1
4

Deleting the last 'c' works, but deleting the first 'c' does not. A simplistic frequency-based method would fail because both strings contain almost the same character counts.

Finally, there are cases with no valid answer:

Input:

abcd
axy

Output:

Even after deleting one character from "abcd", the strings can never become equal.

Approaches

The brute-force idea is straightforward. Try deleting every character from the longer string and compare the result with the shorter string.

For a string of length $n$, there are $n$ candidate deletions. Constructing and comparing each resulting string costs $O(n)$, so the total complexity becomes $O(n^2)$.

With $n = 10^6$, this approach would require around $10^{12}$ character operations. That is completely infeasible.

The key observation is that after deleting one character, the remaining parts of the strings must line up perfectly.

Suppose we delete position $i$ from the longer string $s$. Then:

The prefix before $i$ must already match.
The suffix after $i$ in $s$ must match the corresponding suffix in $t$, shifted by one position.

This naturally suggests preprocessing prefix and suffix matches.

Define:

pref[i] as whether s[0:i] == t[0:i]
suff[i] as whether s[i:] == t[i-1:]

The suffix definition accounts for the one-character shift caused by deletion.

Now a position $i$ is valid if:

everything before $i$ matches
everything after $i$ matches after shifting

So we only need constant-time checks per position after linear preprocessing.

This reduces the complexity from quadratic to linear.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	$O(n^2)$	$O(n)$	Too slow
Optimal	$O(n)$	$O(n)$	Accepted

Algorithm Walkthrough

Read the two strings s and t.
Let n = len(s). Since s is guaranteed to be one character longer, len(t) = n - 1.
Build a prefix match array pref.

pref[i] means that the first i characters of both strings are identical.

Initialize:

pref[0] = True

Then for every i from 0 to n-2:

pref[i + 1] = pref[i] and (s[i] == t[i])

This works because prefixes match up to i+1 only if the previous prefixes matched and the current characters are equal. 4. Build a suffix match array suff.

suff[i] means:

s[i:] == t[i-1:]

The indices are shifted because one character from s is assumed deleted.

Initialize:

suff[n] = True

Then iterate backward:

suff[i] = suff[i + 1] and (s[i] == t[i - 1])

Check every deletion position i.

Deleting s[i] works if:

pref[i] and suff[i + 1]

pref[i] guarantees the left parts match.

suff[i + 1] guarantees the right parts match after skipping s[i]. 6. Output all valid positions using 1-based indexing.

Why it works

For any deletion position $i$, the resulting string consists of two unchanged parts:

the prefix before $i$
the suffix after $i$

The prefix condition checks that:

s[0:i] == t[0:i]

The suffix condition checks that:

s[i+1:] == t[i:]

If both are true, then every character in the resulting string matches the target string.

If either condition fails, some character differs and the deletion cannot work.

Since every candidate position is tested exactly against these necessary and sufficient conditions, the algorithm is correct.

Python Solution

import sys
input = sys.stdin.readline

s = input().strip()
t = input().strip()

n = len(s)

pref = [False] * (n + 1)
pref[0] = True

for i in range(n - 1):
    pref[i + 1] = pref[i] and (s[i] == t[i])

suff = [False] * (n + 1)
suff[n] = True

for i in range(n - 1, 0, -1):
    suff[i] = suff[i + 1] and (s[i] == t[i - 1])

ans = []

for i in range(n):
    if pref[i] and suff[i + 1]:
        ans.append(i + 1)

print(len(ans))

if ans:
    print(*ans)

The prefix array stores whether prefixes match exactly up to a certain length. Using cumulative boolean propagation avoids recomputing comparisons repeatedly.

The suffix array is slightly more subtle because the indices are shifted. After deleting s[i], character s[i+1] must align with t[i]. That is why the suffix comparison uses t[i-1] during preprocessing.

The backward loop starts from n - 1 and stops at 1 because t[i - 1] must remain valid. Position 0 does not need a suffix value since deleting the first character only depends on suff[1].

The final loop checks every possible deletion independently in constant time.

The implementation avoids building temporary strings entirely. This is critical for large inputs because repeated slicing would dramatically increase both runtime and memory usage.

Worked Examples

Example 1

Input:

abdrakadabra
abrakadabra

i	s[i]	pref[i]	suff[i+1]	Valid
0	a	True	False	No
1	b	True	False	No
2	d	True	True	Yes
3	r	False	False	No

Output:

1
3

Deleting the third character 'd' transforms "abdrakadabra" into "abrakadabra".

This example demonstrates how the algorithm isolates the exact mismatch location while keeping all surrounding segments verified.

Example 2

Input:

aaaa
aaa

i	s[i]	pref[i]	suff[i+1]	Valid
0	a	True	True	Yes
1	a	True	True	Yes
2	a	True	True	Yes
3	a	True	True	Yes

Output:

4
1 2 3 4

Every deletion produces the same resulting string.

This case confirms that the algorithm correctly handles repeated characters and does not stop after the first match.

Complexity Analysis

Measure	Complexity	Explanation
Time	$O(n)$	One forward pass, one backward pass, one validation pass
Space	$O(n)$	Prefix and suffix arrays

With strings up to $10^6$ characters, linear complexity is exactly what the problem requires. The solution performs only a few passes over the data and easily fits within the time limit. The memory usage is also safe under the 256 MB limit.

Test Cases

# helper: run solution on input string, return output string
import sys, io

def solve():
    input = sys.stdin.readline

    s = input().strip()
    t = input().strip()

    n = len(s)

    pref = [False] * (n + 1)
    pref[0] = True

    for i in range(n - 1):
        pref[i + 1] = pref[i] and (s[i] == t[i])

    suff = [False] * (n + 1)
    suff[n] = True

    for i in range(n - 1, 0, -1):
        suff[i] = suff[i + 1] and (s[i] == t[i - 1])

    ans = []

    for i in range(n):
        if pref[i] and suff[i + 1]:
            ans.append(i + 1)

    out = [str(len(ans))]

    if ans:
        out.append(" ".join(map(str, ans)))

    print("\n".join(out))

def run(inp: str) -> str:
    backup_stdin = sys.stdin
    backup_stdout = sys.stdout

    sys.stdin = io.StringIO(inp)
    sys.stdout = io.StringIO()

    solve()

    result = sys.stdout.getvalue()

    sys.stdin = backup_stdin
    sys.stdout = backup_stdout

    return result

# provided sample
assert run(
    "abdrakadabra\nabrakadabra\n"
) == "1\n3\n", "sample 1"

# minimum size
assert run(
    "ab\na\n"
) == "1\n2\n", "minimum size"

# all deletions valid
assert run(
    "aaaa\naaa\n"
) == "4\n1 2 3 4\n", "all positions valid"

# deletion at beginning
assert run(
    "xabc\nabc\n"
) == "1\n1\n", "delete first character"

# deletion at end
assert run(
    "abcd\nabc\n"
) == "1\n4\n", "delete last character"

# no valid deletion
assert run(
    "abcd\naxy\n"
) == "0\n", "impossible case"

Test input	Expected output	What it validates
`ab / a`	`2`	Smallest meaningful input
`aaaa / aaa`	`1 2 3 4`	Multiple valid answers
`xabc / abc`	`1`	Correct handling of first position
`abcd / abc`	`4`	Correct handling of last position
`abcd / axy`	`0`	Impossible transformation

Edge Cases

Consider repeated characters:

Input:

aaaa
aaa

The prefix array remains true at every position because all compared characters are equal. The suffix array also remains true everywhere for the same reason.

For every index i:

pref[i] = True
suff[i+1] = True

So all positions are accepted.

Now consider deletion at the end:

Input:

abcd
abc

The prefix comparisons succeed for the first three characters:

a == a
b == b
c == c

When checking deletion at position 4:

pref[3] = True
suff[4] = True

The suffix condition is vacuously true because nothing remains after the deleted character.

The algorithm correctly outputs:

1
4

Now consider an impossible case:

Input:

abcd
axy

The prefix comparison fails immediately after the first character:

b != x

No deletion position can satisfy both prefix and suffix conditions simultaneously.

The algorithm outputs:

Finally, consider ambiguity caused by repeated letters:

Input:

abcc
abc

Deleting the last 'c' works:

abc == abc

Deleting the first 'c' produces:

abc != abc

Actually:

abcc -> abc

only after removing the fourth character.

The prefix and suffix conditions distinguish these cases precisely, so the algorithm outputs:

1
4