簡明程式解題入門 - 字串篇 III


Posted by KD Chang on 2021-01-15

前言

在軟體工程師和程式設計的工作或面試的過程中不會總是面對到只有簡單邏輯判斷或計算的功能,此時就會需要運用到演算法和資料結構的知識。本系列文將整理程式設計的工作或面試入門常見的程式解題問題和讀者一起探討,研究可能的解決方式(主要使用的程式語言為 Python)。其中字串處理操作是程式設計日常中常見的使用情境,本文繼續將從簡單的字串處理問題開始進行介紹。

要注意的是在 Python 的字串物件 strimmutable 不可變的,字串中的字元是無法單獨更改變換。

舉例來說,以下的第一個指定字元變數行為是不被允許的,但整體字串重新賦值指定另外一個字串物件是可以的:

name = 'Jack'

# 錯誤
name[0] = 'L'

# 正確
name = 'Leo'

執行結果:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

常見問題

字串轉整數

Problem: String to Integer (atoi)
Implement the myAtoi(string s) function, which converts a string to a 32-bit signed integer (similar to C/C++'s atoi function).

The algorithm for myAtoi(string s) is as follows:

  1. Read in and ignore any leading whitespace.
  2. Check if the next character (if not already at the end of the string) is '-' or '+'. Read this character in if it is either. This determines if the final result is negative or positive respectively. Assume the result is positive if neither is present.
  3. Read in next the characters until the next non-digit charcter or the end of the input is reached. The rest of the string is ignored.
  4. Convert these digits into an integer (i.e. "123" -> 123, "0032" -> 32). If no digits were read, then the integer is 0. Change the sign as necessary (from step 2).
  5. If the integer is out of the 32-bit signed integer range [-2^31, 2^31 - 1], then clamp the integer so that it remains in the range. Specifically, integers less than -2^31 should be clamped to -2^31, and integers greater than 2^31 - 1 should be clamped to 2^31 - 1.
  6. Return the integer as the final result.

Note:
Only the space character ' ' is considered a whitespace character.
Do not ignore any characters other than the leading whitespace or the rest of the string after the digits.

Example 1:

Input: str = "42"
Output: 42
Explanation: The underlined characters are what is read in, the caret is the current reader position.
Step 1: "42" (no characters read because there is no leading whitespace)
         ^
Step 2: "42" (no characters read because there is neither a '-' nor '+')
         ^
Step 3: "42" ("42" is read in)
           ^
The parsed integer is 42.
Since 42 is in the range [-2^31, 2^31 - 1], the final result is 42.

Example 2:

Input: str = "   -42"
Output: -42
Explanation:
Step 1: "   -42" (leading whitespace is read and ignored)
            ^
Step 2: "   -42" ('-' is read, so the result should be negative)
             ^
Step 3: "   -42" ("42" is read in)
               ^
The parsed integer is -42.
Since -42 is in the range [-2^31, 2^31 - 1], the final result is -42.

Example 3:

Input: str = "4193 with words"
Output: 4193
Explanation:
Step 1: "4193 with words" (no characters read because there is no leading whitespace)
         ^
Step 2: "4193 with words" (no characters read because there is neither a '-' nor '+')
         ^
Step 3: "4193 with words" ("4193" is read in; reading stops because the next character is a non-digit)
             ^
The parsed integer is 4193.
Since 4193 is in the range [-2^31, 2^31 - 1], the final result is 4193.

Example 4:

Input: str = "words and 987"
Output: 0
Explanation:
Step 1: "words and 987" (no characters read because there is no leading whitespace)
         ^
Step 2: "words and 987" (no characters read because there is neither a '-' nor '+')
         ^
Step 3: "words and 987" (reading stops immediately because there is a non-digit 'w')
         ^
The parsed integer is 0 because no digits were read.
Since 0 is in the range [-2^31, 2^31 - 1], the final result is 4193.

Example 5:

Input: str = "-91283472332"
Output: -2147483648
Explanation:
Step 1: "-91283472332" (no characters read because there is no leading whitespace)
         ^
Step 2: "-91283472332" ('-' is read, so the result should be negative)
          ^
Step 3: "-91283472332" ("91283472332" is read in)
                     ^
The parsed integer is -91283472332.
Since -91283472332 is less than the lower bound of the range [-2^31, 2^31 - 1], the final result is clamped to -231 = -2147483648.

Constraints:

0 <= s.length <= 200
s consists of English letters (lower-case and upper-case), digits (0-9), ' ', '+', '-', and '.'.

這個題目主要是要設計一個程式可以讓輸入的字串(輸入字串有可能有大小寫英文單字、,+-.)去除前方多餘空格和零並轉成整數(若為負號則為副整數),但有幾個限制條件,若不符合則不進行轉換回傳 0 或超過限制的最大最小值([-2^31, 2^31 - 1])回傳最大最小值。

Solution
參考方法:

class Solution(object):
    def myAtoi(self, s):
        """
        :type s: str
        :rtype: int
        """
        s = s.lstrip()
        if len(s) < 1:
            return 0

        # 建立是否為負數標籤
        minus_flag = False
        if s[0] in ['+', '-']:
            if s[0] == '-':
                minus_flag = True
            # 取符號後面的字串
            s = s[1:]

        # 判斷是否長度小於 1,若是代表沒有整數值
        if len(s) < 1:
            return 0

        # 判斷是否符號後面是接非整數字元
        if not s[0].isdigit():
            return 0

        # 建立一個可以將整數字元放入的串列
        str_list = []

        # 將字串中字元一一取出
        for s_chr in s:
            # 判斷是否為整數
            if s_chr.isdigit():
                str_list.append(s_chr)
            # 當遇到非整數字元則跳出迴圈
            else:
                break

        # 根據題目設置上下限
        INI_MAX = pow(2, 31) - 1
        INT_MIN = pow(2, 31) * -1

        # 根據是否為負整數來判斷回傳值
        if minus_flag:
            result_num = int(''.join(str_list)) * -1
            if result_num < INT_MIN:
                return INT_MIN
        else:
            result_num = int(''.join(str_list))
            if result_num > INI_MAX:
                return INI_MAX

        return result_num

本題主要需要針對題目規劃流程,依序實作:

  1. 去除最前面多餘空格
  2. 確定數字正負符號
  3. 使用迴圈逐一取出符合規則的整數,若過程中遇到非整數字元則跳出迴圈
  4. 將整數加上正負號(若整數最前方有 0 也要去除,使用 int() 函式去除)
  5. 判斷是否有在最大最小值限制中,若無則回傳上下界,否則回傳轉換結果

實作 strStr

Problem: Implement strStr()
Implement strStr().

Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Clarification:

What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C's strstr() and Java's indexOf().

Example 1:

Input: haystack = "hello", needle = "ll"
Output: 2

Example 2:

Input: haystack = "aaaaa", needle = "bba"
Output: -1

Example 3:

Input: haystack = "", needle = ""
Output: 0

Constraints:

0 <= haystack.length, needle.length <= 5 * 104
haystack and needle consist of only lower-case English characters.

這個題目主要是實作類似 strStr() 函式的程式,可以判斷最初出現的子字串在字串中的索引位置,若無符合則回傳 -1。其中 haystack 代表字串,needle 代表比對的子字串。

Solution
參考方法:

class Solution(object):
    def strStr(self, haystack, needle):
        """
        :type haystack: str
        :type needle: str
        :rtype: int
        """
        if len(needle) > len(haystack):
            return -1

        # 若字串和子字串相等則回傳索引 0
        if needle == haystack:
            return 0

        str_length = len(haystack)
        substr_length = len(needle)
        # 從 0 開始往右切出子字串比對
        pointer = 0

        # 若是 pointer 位置 ≤ 字串減掉子字串長度代表還可以繼續比對
        while pointer <= (str_length - substr_length):
            # 以 pointer 為基準進行子字串比對,若符合則回傳 pointer 為索引
            if haystack[pointer: pointer + substr_length] == needle:
                return pointer
            else:
                pointer += 1

        return -1

主要透過 while 迴圈比對子字串的索引狀況並根據 Python 切片取出的子字串互相比對,若符合則回傳指標,不符合則繼續比對下去。若最後都沒有符合則回傳 -1

報數

Problem: Count and Say
The count-and-say sequence is a sequence of digit strings defined by the recursive formula:

countAndSay(1) = "1"
countAndSay(n) is the way you would "say" the digit string from countAndSay(n-1), which is then converted into a different digit string.
To determine how you "say" a digit string, split it into the minimal number of groups so that each group is a contiguous section all of the same character. Then for each group, say the number of characters, then say the character. To convert the saying into a digit string, replace the counts with a number and concatenate every saying.

For example, the saying and conversion for digit string "3322251":

Given a positive integer n, return the nth term of the count-and-say sequence.

Example 1:

Input: n = 1
Output: "1"
Explanation: This is the base case.

Example 2:

Input: n = 4
Output: "1211"

Explanation:

countAndSay(1) = "1"
countAndSay(2) = say "1" = one 1 = "11"
countAndSay(3) = say "11" = two 1's = "21"
countAndSay(4) = say "21" = one 2 + one 1 = "12" + "11" = "1211"

Constraints:

1 <= n <= 30

Solution
參考方法:

class Solution(object):
    def countAndSay(self, n):
        """
        :type n: int
        :rtype: str
        """
        # 限制 1 <= n <= 30
        if n not in range(1, 31):
            return "the input is error."
        # 當 n 等於 1 時回傳 1
        if n == 1:
            return '1'

        # 使用遞迴
        s = self.countAndSay(n - 1)

        # 初始化變數
        say_str = ''
        count = 1
        index = 0

        # 當字串長度大於 index 進入迴圈(結束條件)
        while index < len(s):
            # 當前後值相等,count 加 1(必須確認 index + 1 還有值)
            if index + 1 < len(s) and s[index] == s[index + 1]:
                count += 1
            else:
                # 將 count 和對應數值加入回傳字串
                say_str += str(count) + str(s[index])
                count = 1
            index += 1
        return say_str

首先我們必須先了解一下題目所謂的 Count and Say 是什麼意思。

意思是輸出的字串格式會是:

{數量}{數值}

舉例而言,若整數字串為:

1

則輸出字串:

# one one 一個一的意思
11

若整數字串為:

11

則輸出字串:

# two one 兩個一的意思
21

若整數字串為:

21

則輸出字串:

# one two one one 一個一,兩個一的意思
1211

會從輸入字串開始讀取,若連續一樣整數,則累加計數,依此類推。

在這邊我們使用遞迴方式,每次計算好的字串當作下一次的初始字串使用。其中若索引遇到前後值相同則,計數加一。若索引前後不同則將累計的計數和值加到回傳字串上。

最長公共前綴

Problem: Longest Common Prefix
Write a function to find the longest common prefix string amongst an array of strings.

If there is no common prefix, return an empty string "".

Example 1:

Input: strs = ["flower","flow","flight"]
Output: "fl"

Example 2:

Input: strs = ["dog","racecar","car"]
Output: ""
Explanation: There is no common prefix among the input strings.

Constraints:

0 <= strs.length <= 200
0 <= strs[i].length <= 200
strs[i] consists of only lower-case English letters.

Solution
參考方法:

class Solution:
    def longestCommonPrefix(self, strs: List[str]) -> str:
        # 去除串列長度為 0 和字串為空值的情況
        if len(strs) == 0 or '' in strs:
            return ''
        # 若串列長度為一,則回傳第一個字串
        if len(strs) == 1:
            return strs[0]

        # 建立取出共同前綴字的串列空間
        common_prefix = []
        # 最大前綴長度不超過最短字串的長度
        min_str_length = min([len(str) for str in strs])

        # 使用最短字串的長度當作迴圈條件一一取出字元放入共同前綴字的串列空間
        for i in range(min_str_length):
            for string in strs:
                common_prefix.append(string[:i + 1])

            # 若共同前綴字的串列空間取 set 長度大於一代表已有分歧,回傳上一個共同結果
            if len(set(common_prefix)) == 1:
                common_prefix = []
            else:
                return strs[0][:i]

        # 回傳到最短字串的長度的子字串
        return strs[0][:min_str_length]

透過取出字串串列中最短字串的長度,我們可以使用 for 迴圈當作迴圈條件一一取出字元放入共同前綴字的串列空間進行比較。若共同前綴字的串列空間取 set(可計算不重複字串)長度大於一代表已有分歧,回傳上一個共同結果為最大前綴,否則回傳到最短字串的長度的子字串。

總結

字串處理系列到這邊告一段落,本系列文將持續整理程式設計的工作或面試入門常見的程式解題問題和讀者一起探討,研究可能的解決方式(主要使用的程式語言為 Python)。以上介紹了幾個簡單字串操作的問題當作開頭,需要注意的是 Python 本身內建的操作方式和其他程式語言可能不同,文中所列的不一定是最好的解法,讀者可以自行嘗試在時間效率和空間效率運用上取得最好的平衡點。

參考文件

  1. 8. String to Integer (atoi)

#Coding Challenge #string #程式設計 #Leetcode









Related Posts

querySelector 和 getElement 差別

querySelector 和 getElement 差別

什麼是 SPA ?

什麼是 SPA ?

Post-Exploitation Basics Walkthrough

Post-Exploitation Basics Walkthrough




Newsletter




Comments