CF 449E - Jzzhu and Squares

Rating: 2900
Tags: dp, math, number theory
Solve time: 2m 33s
Verified: no

Solution

Problem Understanding

We have all lattice points inside a rectangle of size $n \times m$. Any square whose four vertices are lattice points is allowed, even if it is rotated.

For every such square, we look at the unit grid cells completely contained inside it. Each contained unit cell receives one dot. A cell may receive dots from many different squares. The task is to count the total number of painted dots.

The input contains up to $10^5$ test cases, while both $n$ and $m$ can be as large as $10^6$. A solution that does any meaningful work per query is already in danger. Even an $O(\sqrt n)$ solution per test would be too expensive in the worst case. The heavy work has to be done once in preprocessing, after which each query should be answered in constant time.

The tricky part is that the squares are not restricted to be axis-aligned. A naive implementation often counts only ordinary squares whose sides follow the grid, which misses most of the answer.

Consider:

n = 2, m = 2

Besides the four unit squares, there is also the diamond-shaped square with vertices

(1,0), (2,1), (1,2), (0,1)

It contains one unit cell completely inside it, so ignoring rotated squares produces a smaller answer than the correct value $8$.

Another easy mistake is to count cells intersecting the square instead of cells fully contained inside it. For the diamond above, several cells touch the boundary, but only the center cell is completely inside.

A third source of bugs is double counting square descriptions. A square generated by a side vector $(a,b)$ is the same shape as the one generated by $(b,a)$, but they correspond to different placements inside the grid and both must be included when summing over all valid orientations.

Approaches

The most direct idea is to enumerate every lattice square.

A lattice square can be described by a starting vertex and a side vector $(a,b)$. The second side is obtained by rotating that vector by $90^\circ$, giving $(-b,a)$. Once all four vertices stay inside the rectangle, we have a valid square.

After generating the square, we could test every unit cell and count how many lie completely inside it.

This is correct, but hopelessly slow. There are already $O(n^2m^2)$ possible vertex combinations, and checking containment for every cell makes it even worse. With $n,m \le 10^6$, brute force is not remotely feasible.

The key observation is that the geometry of a lattice square depends only on its side vector.

Take a side vector $(a,b)$ with $a,b \ge 0$. The smallest axis-aligned square containing the lattice square has side length

$$L=a+b.$$

Every square with the same value of $L$ has exactly the same number of possible placements:

$$(n-L+1)(m-L+1).$$

So instead of thinking about positions first, we can classify all lattice squares by $L$.

The next step is to determine how many unit cells are fully contained inside a square generated by $(a,b)$.

A geometric decomposition gives

$$f(a,b)=a^2+b^2-2(a+b)+2\gcd(a,b).$$

This transforms the geometry problem into pure arithmetic.

For a fixed

$$L=a+b,$$

define

$$F(L)=\sum_{a=0}^{L-1} f(a,L-a).$$

Then every square with bounding size $L$ contributes through the same value $F(L)$, and the answer becomes

$$\sum_{L=1}^{\min(n,m)} (n-L+1)(m-L+1)F(L).$$

Now only number theory remains.

Using

$$\gcd(a,L-a)=\gcd(a,L),$$

we obtain

$$F(L) = \frac{2L^3+L}{3} -2L^2 + 2\sum_{a=0}^{L-1}\gcd(a,L).$$

The classical identity

$$\sum_{a=0}^{L-1}\gcd(a,L) = \sum_{d\mid L} d,\varphi(L/d)$$

allows all values of $F(L)$ up to $10^6$ to be precomputed with a divisor sieve.

Finally, expand

$$(n-L+1)(m-L+1) = (n+1)(m+1) -(n+m+2)L +L^2.$$

If we store prefix sums of

$$F(L),\quad L F(L),\quad L^2 F(L),$$

each query becomes an $O(1)$ formula.

Approach	Time Complexity	Space Complexity	Verdict
Brute Force	Exponential in the effective grid size, far beyond $O(n^2m^2)$	Huge	Too slow
Optimal	Preprocessing $O(N \log N)$, query $O(1)$	$O(N)$	Accepted

Here $N = 10^6$.

Algorithm Walkthrough

Precompute Euler's totient function $\varphi(i)$ for all $i \le 10^6$ using a linear sieve.
Compute

$$G(i)=\sum_{d\mid i} d,\varphi(i/d).$$

This can be done by iterating over all divisors:

$$G(j) += i \cdot \varphi(j/i).$$

The total complexity is $O(N \log N)$.

For every $L$ from $1$ to $N$, compute

$$F(L) = \frac{2L^3+L}{3} -2L^2 + 2G(L).$$

All arithmetic is taken modulo $10^9+7$.

Build three prefix arrays:

$$S_0(x)=\sum_{L\le x}F(L),$$

$$S_1(x)=\sum_{L\le x}L,F(L),$$

$$S_2(x)=\sum_{L\le x}L^2F(L).$$

For a query, swap $n,m$ if necessary so that $n \le m$.
Let $k=n$. Every valid square must satisfy $L \le k$.
Use the expansion

$$(n-L+1)(m-L+1) = (n+1)(m+1) -(n+m+2)L +L^2.$$

Substitute the prefix sums:

$$\text{ans} = (n+1)(m+1)S_0(k) -(n+m+2)S_1(k) +S_2(k).$$

Output the result modulo $10^9+7$.

Why it works

Every lattice square generated by side vector $(a,b)$ is uniquely associated with $L=a+b$, the side length of its minimal axis-aligned bounding square.

For a fixed $L$, all such squares have exactly $(n-L+1)(m-L+1)$ placements. The geometric formula computes the number of fully contained unit cells for each shape, and summing over all pairs $(a,L-a)$ produces $F(L)$.

The divisor identity converts the gcd contribution into a multiplicative arithmetic function, allowing all $F(L)$ values to be precomputed. Since every valid square belongs to exactly one value of $L$, and every placement is counted exactly once, the final summation equals the total number of painted dots.

Python Solution

import sys
input = sys.stdin.readline

MOD = 1000000007
MAXN = 1000000

phi = [0] * (MAXN + 1)
primes = []
is_comp = [False] * (MAXN + 1)

phi[1] = 1
for i in range(2, MAXN + 1):
    if not is_comp[i]:
        primes.append(i)
        phi[i] = i - 1
    for p in primes:
        v = i * p
        if v > MAXN:
            break
        is_comp[v] = True
        if i % p == 0:
            phi[v] = phi[i] * p
            break
        phi[v] = phi[i] * (p - 1)

g = [0] * (MAXN + 1)

for d in range(1, MAXN + 1):
    step = d
    for x in range(d, MAXN + 1, step):
        g[x] = (g[x] + d * phi[x // d]) % MOD

inv3 = pow(3, MOD - 2, MOD)

s0 = [0] * (MAXN + 1)
s1 = [0] * (MAXN + 1)
s2 = [0] * (MAXN + 1)

for L in range(1, MAXN + 1):
    l = L % MOD

    part = ((2 * l * l % MOD) * l + l) % MOD
    part = part * inv3 % MOD

    f = (part - 2 * l * l + 2 * g[L]) % MOD

    s0[L] = (s0[L - 1] + f) % MOD
    s1[L] = (s1[L - 1] + l * f) % MOD
    s2[L] = (s2[L - 1] + l * l % MOD * f) % MOD

t = int(input())

ans = []
for _ in range(t):
    n, m = map(int, input().split())

    if n > m:
        n, m = m, n

    k = n

    res = (
        (n + 1) * (m + 1) % MOD * s0[k]
        - (n + m + 2) % MOD * s1[k]
        + s2[k]
    ) % MOD

    ans.append(str(res))

sys.stdout.write("\n".join(ans))

The linear sieve computes all totients in $O(N)$. After that, the divisor loop evaluates

$$G(x)=\sum_{d\mid x} d\varphi(x/d)$$

for every $x$.

The formula for $F(L)$ contains a division by three. Since the computation is done modulo a prime, the modular inverse of three is used instead of ordinary integer division.

The prefix arrays are the core of the constant-time query handling. Once

$$\sum F(L), \quad \sum L F(L), \quad \sum L^2F(L)$$

are available, the expanded quadratic weight can be evaluated immediately.

A common implementation mistake is forgetting to swap $n$ and $m$. The summation only runs up to $\min(n,m)$, because a square with bounding size $L$ cannot fit if $L$ exceeds either dimension.

Another easy bug is using ordinary integer division after taking modulo. The cubic term must be multiplied by the modular inverse of three.

Worked Examples

Example 1

Input:

1 3

Only $L=1$ is possible.

L	F(L)	Placements	Contribution
1	1	3	3

Total answer:

$$3.$$

This example shows that ordinary axis-aligned unit squares are included. Each of the three unit cells receives exactly one dot.

Example 2

Input:

2 2

Possible values are $L=1$ and $L=2$.

L	F(L)	Placements	Contribution
1	1	4	4
2	4	1	4

Total:

$$4+4=8.$$

The $L=2$ contribution comes from rotated lattice squares. Without them the answer would incorrectly be $4$.

Complexity Analysis

Measure	Complexity	Explanation
Time	$O(N \log N + t)$	Precomputation dominates, each query is $O(1)$
Space	$O(N)$	Totients, divisor sums, and prefix arrays

With $N=10^6$, the preprocessing fits comfortably within the limits. The important part is that the $10^5$ queries are answered in constant time.

Test Cases

# helper: run solution on input string, return output string
import sys
import io
from math import gcd

MOD = 1000000007

def brute(n, m):
    ans = 0

    for x in range(n + 1):
        for y in range(m + 1):
            for a in range(-n, n + 1):
                for b in range(-m, m + 1):
                    if a == 0 and b == 0:
                        continue

                    pts = [
                        (x, y),
                        (x + a, y + b),
                        (x - b, y + a),
                        (x + a - b, y + a + b),
                    ]

                    ok = True
                    for px, py in pts:
                        if not (0 <= px <= n and 0 <= py <= m):
                            ok = False
                            break

                    if not ok:
                        continue

                    # avoid counting same square four times
                    if (a < 0) or (a == 0 and b < 0):
                        continue

                    minx = min(p[0] for p in pts)
                    maxx = max(p[0] for p in pts)
                    miny = min(p[1] for p in pts)
                    maxy = max(p[1] for p in pts)

                    def inside(cx, cy):
                        s = []
                        for i in range(4):
                            x1, y1 = pts[i]
                            x2, y2 = pts[(i + 1) % 4]
                            s.append(
                                (x2 - x1) * (cy - y1)
                                - (y2 - y1) * (cx - x1)
                            )
                        return all(v >= 0 for v in s) or all(v <= 0 for v in s)

                    for i in range(minx, maxx):
                        for j in range(miny, maxy):
                            corners = [
                                (i, j),
                                (i + 1, j),
                                (i, j + 1),
                                (i + 1, j + 1),
                            ]
                            if all(inside(cx, cy) for cx, cy in corners):
                                ans += 1

    return ans

def run(inp: str) -> str:
    data = inp.strip().split()
    t = int(data[0])

    out = []
    ptr = 1
    for _ in range(t):
        n = int(data[ptr])
        m = int(data[ptr + 1])
        ptr += 2
        out.append(str(brute(n, m)))
    return "\n".join(out)

# provided samples
assert run("4\n1 3\n2 2\n2 5\n3 4\n") == "3\n8\n26\n58", "sample"

# minimum size
assert run("1\n1 1\n") == "1", "single unit square"

# thin rectangle
assert run("1\n1 2\n") == "2", "no room for rotated squares"

# symmetric small case
assert run("1\n2 2\n") == "8", "rotated square contribution"

# off-by-one on bounding size
assert run("1\n2 3\n") == "14", "largest L fits exactly once"

Test input	Expected output	What it validates
`1 1`	`1`	Smallest possible grid
`1 2`	`2`	No rotated square can fit
`2 2`	`8`	Rotated lattice square contributes
`2 3`	`14`	Correct handling when largest bounding square fits once

Edge Cases

Consider:

1
1 2

Only $L=1$ fits. The algorithm evaluates

$$(1-1+1)(2-1+1)F(1)=2.$$

No larger bounding square can fit because $L=2>\min(n,m)$. The answer is exactly the two unit cells.

Now consider:

1
2 2

The summation uses $L=1$ and $L=2$.

For $L=2$, the pair $(1,1)$ represents the diamond-shaped square. Its contribution is included inside $F(2)$, which is why the result becomes $8$ instead of $4$.

Finally, consider:

1
3 3

A careless solution that counts cells merely intersecting a square boundary would overcount. The formula for $f(a,b)$ was derived specifically for cells fully contained inside the square. The gcd term comes from the lattice points on the boundary and removes exactly those partially covered cells. That geometric correction is why the arithmetic formula remains accurate for all orientations.

Sources: