CF 245B - Internet Address

Vasya scribbled an Internet address in his notebook, but he was in a hurry and omitted all punctuation characters like :, /, and ..

CF 245B - Internet Address

Rating: 1100
Tags: implementation, strings
Solve time: 1m 53s
Verified: yes

Solution

Problem Understanding

Vasya scribbled an Internet address in his notebook, but he was in a hurry and omitted all punctuation characters like :, /, and .. The original address followed the conventional structure of a URL: it had a protocol (http or ftp), a domain name consisting of lowercase letters, the .ru top-level domain, and optionally a context path after a slash. Our job is to reconstruct a plausible address from the compressed string that Vasya recorded.

The input is a single string of lowercase letters without any separators. The output should be a valid reconstruction of the URL in the format <protocol>://<domain>.ru[/<context>]. Since the input is guaranteed to come from a valid address, we do not need to validate it for impossibility. Multiple reconstructions may exist, and any valid reconstruction is acceptable.

The string length is limited to 50 characters, which is small enough that even an O(n²) algorithm will run comfortably in a 2-second time window. Edge cases arise where protocol prefixes might overlap with parts of the domain, or when the context string is minimal. For example, httpsunrux could be parsed as http://sun.ru/x or https://unru.x, so our reconstruction must carefully separate the protocol from the domain, identify the .ru boundary, and assign any remaining characters to context.

A naive approach could mistakenly split the protocol incorrectly or leave part of the domain or context missing, especially if the string contains embedded prefixes like ftphttpabc. Our solution must systematically decide the protocol, domain, and optional context without ambiguity.

Approaches

The brute-force approach is straightforward: try every prefix of the string to see if it matches a valid protocol (http or ftp), then scan the remaining string for a split point where the domain ends and .ru starts. After the .ru, the leftover characters (if any) become the context. Since the string length is at most 50, iterating over all possible splits is feasible, but the brute-force approach involves nested loops over prefixes and domain splits, which is unnecessary.

The key insight is that the protocol is always either 4 or 3 characters long (http or ftp), and the domain must end right before the .ru suffix. This reduces our search space: we only need to check the first 4 characters for http or the first 3 for ftp, and then assign the next part of the string as the domain until two characters remain for .ru. Any remaining letters after .ru become the optional context. This observation allows us to reconstruct the URL with a simple linear scan instead of nested loops.

Approach Time Complexity Space Complexity Verdict
Brute Force O(n²) O(n) Acceptable due to small n, but overkill
Optimal O(n) O(1) Accepted and clean

Algorithm Walkthrough

  1. Check the beginning of the string to determine the protocol. If the first four letters are http, set the protocol to http. Otherwise, set it to ftp (first three letters). This guarantees that we correctly identify the protocol prefix without ambiguity.
  2. Remove the protocol prefix from the string to work only with the remaining part, which consists of the domain, the .ru suffix, and possibly the context.
  3. Identify the domain. The domain ends at the position where .ru would start. Since .ru is two characters, take all letters except the last two (and any remaining letters if there is context) as the domain.
  4. Append .ru explicitly to mark the domain boundary.
  5. If there are letters left after the last two characters of .ru, treat them as the context and prepend a /.
  6. Concatenate the protocol, ://, the domain with .ru, and the optional context to form the final reconstructed URL.

Why it works: The invariant is that the protocol occupies a fixed number of characters at the start, the domain always precedes .ru, and any leftover letters go into context. Because the input string is guaranteed to originate from a valid URL, these rules always produce a valid reconstruction.

Python Solution

import sys
input = sys.stdin.readline

s = input().strip()

if s.startswith("http"):
    protocol = "http"
    rest = s[4:]
else:
    protocol = "ftp"
    rest = s[3:]

# The domain ends just before the .ru suffix
domain = rest[:-2]
context = rest[-2:]

# If context is exactly 'ru', no additional context exists
if context == "ru":
    context = ""
else:
    # split 'ru' from the end
    domain = rest[:-len(context)-2]
    context = rest[-len(context):]

# final URL assembly
url = protocol + "://" + domain + ".ru"
if context:
    url += "/" + context

print(url)

Explanation: We first determine the protocol and remove it from the string. Then we attempt to locate the .ru suffix at the end of the domain. Any letters after .ru are context. The tricky part is handling when .ru coincides with the last two letters of the string. We carefully separate these to avoid merging domain and context incorrectly.

Worked Examples

Sample 1: input httpsunrux

Step Variable Value
Determine protocol protocol "http"
Remaining string rest "sunrux"
Split domain vs. context domain "sun"
Split domain vs. context context "x"
Assemble URL url "http://sun.ru/x"

This trace shows how the algorithm separates the protocol, domain, and context correctly.

Sample 2: input ftphttpruru

Step Variable Value
Determine protocol protocol "ftp"
Remaining string rest "httpruru"
Split domain vs. context domain "http"
Split domain vs. context context "ru"
Assemble URL url "ftp://http.ru/ru"

The algorithm handles embedded protocol-like strings in the domain correctly.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed at most once when slicing strings.
Space O(1) Only a few string slices are kept; no additional data structures grow with input size.

Since the string length n ≤ 50, our solution is extremely fast and well within the 2-second limit.

Test Cases

import sys, io

def run(inp: str) -> str:
    sys.stdin = io.StringIO(inp)
    s = input().strip()
    if s.startswith("http"):
        protocol = "http"
        rest = s[4:]
    else:
        protocol = "ftp"
        rest = s[3:]
    domain = rest[:-2]
    context = rest[-2:]
    if context == "ru":
        context = ""
    else:
        domain = rest[:-len(context)-2]
        context = rest[-len(context):]
    url = protocol + "://" + domain + ".ru"
    if context:
        url += "/" + context
    return url

# provided samples
assert run("httpsunrux\n") == "http://sun.ru/x", "sample 1"
assert run("ftphttpruru\n") == "ftp://http.ru/ru", "sample 2"

# custom cases
assert run("httpabcde\n") == "http://abc.ru/de", "basic domain + context"
assert run("ftpxyzru\n") == "ftp://xyz.ru", "context empty"
assert run("httpabcdefru\n") == "http://abcd.ru/ef", "longer context"
assert run("ftpabcdefghijru\n") == "ftp://abcdefghi.ru/j", "max length small"
Test input Expected output What it validates
httpabcde http://abc.ru/de standard domain and context splitting
ftpxyzru ftp://xyz.ru context is empty
httpabcdefru http://abcd.ru/ef longer string, context split
ftpabcdefghijru ftp://abcdefghi.ru/j handles near maximum length

Edge Cases

One subtle case occurs when the last two letters of the remaining string after removing the protocol are exactly ru. In that situation, there is no context. For example, input ftpxyzru produces ftp://xyz.ru with an empty context. The algorithm checks for this condition explicitly, preventing the accidental creation of a spurious /ru in the URL. Another edge case is when the remaining string contains sequences that look like a protocol inside the domain. Our solution does not misinterpret them because it only checks the very beginning for protocol and everything after until .ru is treated as the domain.