元老與新秀：Go sort.Search()和sort.Find()

作者：Dashen 2024-03-01 07:34:26

Find的第二個入參,也是一個func,但要求這個func的返回值是int而不是bool.另外Find的返回值有兩個,第二個返回值是bool,代表沒有找到指定元素。

sort.Search()

sort.Search() 提交于遙遠的2010年11月11日,提交者是Go三位創始人之一的Robert Griesemer[1], 隨Go第一個正式版本一起發布

從這個意義上說,是標準庫元老級的函數了~

sort.Search()[2] 用于在排序的切片或數組中查找元素

// Search uses binary search to find and return the smallest index i
// in [0, n) at which f(i) is true, assuming that on the range [0, n),
// f(i) == true implies f(i+1) == true. That is, Search requires that
// f is false for some (possibly empty) prefix of the input range [0, n)
// and then true for the (possibly empty) remainder; Search returns
// the first true index. If there is no such index, Search returns n.
// (Note that the "not found" return value is not -1 as in, for instance,
// strings.Index.)
// Search calls f(i) only for i in the range [0, n).
//
// A common use of Search is to find the index i for a value x in
// a sorted, indexable data structure such as an array or slice.
// In this case, the argument f, typically a closure, captures the value
// to be searched for, and how the data structure is indexed and
// ordered.
//
// For instance, given a slice data sorted in ascending order,
// the call Search(len(data), func(i int) bool { return data[i] >= 23 })
// returns the smallest index i such that data[i] >= 23. If the caller
// wants to find whether 23 is in the slice, it must test data[i] == 23
// separately.
//
// Searching data sorted in descending order would use the <=
// operator instead of the >= operator.
//
// To complete the example above, the following code tries to find the value
// x in an integer slice data sorted in ascending order:
//
// x := 23
// i := sort.Search(len(data), func(i int) bool { return data[i] >= x })
// if i < len(data) && data[i] == x {
//  // x is present at data[i]
// } else {
//  // x is not present in data,
//  // but i is the index where it would be inserted.
// }
//
// As a more whimsical example, this program guesses your number:
//
// func GuessingGame() {
//  var s string
//  fmt.Printf("Pick an integer from 0 to 100.\n")
//  answer := sort.Search(100, func(i int) bool {
//   fmt.Printf("Is your number <= %d? ", i)
//   fmt.Scanf("%s", &s)
//   return s != "" && s[0] == 'y'
//  })
//  fmt.Printf("Your number is %d.\n", answer)
// }
func Search(n int, f func(int) bool) int {
 // Define f(-1) == false and f(n) == true.
 // Invariant: f(i-1) == false, f(j) == true.
 i, j := 0, n
 for i < j {
  h := int(uint(i+j) >> 1) // avoid overflow when computing h
  // i ≤ h < j
  if !f(h) {
   i = h + 1 // preserves f(i-1) == false
  } else {
   j = h // preserves f(j) == true
  }
 }
 // i == j, f(i-1) == false, and f(j) (= f(i)) == true  =>  answer is i.
 return i
}

這段代碼是一個實現了二分查找算法的函數，名為 Search。這個函數用于在一個有序的范圍內尋找滿足特定條件的最小索引 i。以下是對這個函數的詳細解釋：

函數定義：Search(n int, f func(int) bool) int 定義了一個名為 Search 的函數，接收兩個參數：一個整數 n 和一個函數 f。n 定義了搜索范圍的大小（從 0 到 n，不包括 n），而 f 是一個接受整數輸入并返回布爾值的函數。Search 函數返回一個整數，表示滿足 f 條件的最小索引。
**條件函數 f**：f 函數定義了查找的條件。對于索引 i，如果 f(i) 為真，則對于所有 j > i，f(j) 也應為真。這意味著 f 在某點之前為假，之后變為真。Search 函數查找的是這個轉變發生的點。
二分查找邏輯：

初始化兩個指針 i 和 j，分別代表搜索范圍的開始和結束。開始時，i 為 0，j 為 n。
在 i < j 的條件下循環執行。計算中點 h，并判斷 f(h) 的值。
如果 f(h) 為假（false），則說明滿足條件的索引在 h 的右側，將 i 設置為 h + 1。
如果 f(h) 為真（true），則說明滿足條件的索引可能是 h 或在 h 的左側，將 j 設置為 h。
這個過程不斷縮小搜索范圍，直到找到轉變點，即 f(i-1) 為假而 f(i) 為真的位置。

結果返回：當 i 與 j 相遇時，i 就是滿足 f(i) 為真的最小索引。如果整個范圍內沒有找到滿足條件的索引，則返回 n。

會在一個已排序的切片中進行二分查找(因此比線性搜索要快得多，尤其是在處理大型數據集時性能尤其明顯), 最后返回一個索引，這個索引指向切片中第一個不小于指定值的元素。如果沒有找到符合條件的元素，則返回的索引等于切片的長度。

使用時首先需要確保切片或數組已經是排序過的。其次需提供一個函數，這個函數定義了怎樣判斷切片中的元素是否滿足自定義的查找條件。

如下,假設有一個整數切片，已經按升序排序，想找到第一個不小于某個給定值的元素的位置。

package main

import (
 "fmt"
 "sort"
)

func main() {
 // 已排序的切片
 numbers := []int{1, 2, 4, 6, 8, 10}

 // 要查找的值
 target := 5

 // 使用 sort.Search
 index := sort.Search(len(numbers), func(i int) bool {
  return numbers[i] >= target
 })

 // 輸出結果
 if index < len(numbers) {
  fmt.Printf("找到了不小于 %d 的元素，索引為 %d，值為 %d\n", target, index, numbers[index])
 } else {
  fmt.Printf("沒有找到不小于 %d 的元素，索引為 %d\n", target, index)
 }
}

在線運行[3]

輸出:

找到了不小于 5 的元素，索引為 3，值為 6

上面代碼中,sort.Search() 用于在 numbers 切片中查找第一個不小于 5 的元素。由于 4 后面的元素是 6，所以函數返回 6 的索引，即 3。

更多例子:

Go語言中sort.Search()的使用方法（數組中通過值來取索引）[4]

golang 中sort包sort.search()使用教程[5]

sort.Find()

而sort.Find() 則首次提交[6]于2022年3月30日, 隨2022年8月2號發布的Go 1.19[7]一起發布

// Find uses binary search to find and return the smallest index i in [0, n)
// at which cmp(i) <= 0. If there is no such index i, Find returns i = n.
// The found result is true if i < n and cmp(i) == 0.
// Find calls cmp(i) only for i in the range [0, n).
//
// To permit binary search, Find requires that cmp(i) > 0 for a leading
// prefix of the range, cmp(i) == 0 in the middle, and cmp(i) < 0 for
// the final suffix of the range. (Each subrange could be empty.)
// The usual way to establish this condition is to interpret cmp(i)
// as a comparison of a desired target value t against entry i in an
// underlying indexed data structure x, returning <0, 0, and >0
// when t < x[i], t == x[i], and t > x[i], respectively.
//
// For example, to look for a particular string in a sorted, random-access
// list of strings:
//
// i, found := sort.Find(x.Len(), func(i int) int {
//     return strings.Compare(target, x.At(i))
// })
// if found {
//     fmt.Printf("found %s at entry %d\n", target, i)
// } else {
//     fmt.Printf("%s not found, would insert at %d", target, i)
// }
func Find(n int, cmp func(int) int) (i int, found bool) {
 // The invariants here are similar to the ones in Search.
 // Define cmp(-1) > 0 and cmp(n) <= 0
 // Invariant: cmp(i-1) > 0, cmp(j) <= 0
 i, j := 0, n
 for i < j {
  h := int(uint(i+j) >> 1) // avoid overflow when computing h
  // i ≤ h < j
  if cmp(h) > 0 {
   i = h + 1 // preserves cmp(i-1) > 0
  } else {
   j = h // preserves cmp(j) <= 0
  }
 }
 // i == j, cmp(i-1) > 0 and cmp(j) <= 0
 return i, i < n && cmp(i) == 0
}

這段代碼是一個實現二分查找的算法函數，名為 Find。它的目的是在一個滿足特定條件的有序集合中查找一個元素，并返回該元素的索引和一個布爾值，表示是否找到了該元素。以下是對該函數及其組成部分的詳細解釋：

函數定義：Find(n int, cmp func(int) int) (i int, found bool) 定義了一個名為 Find 的函數，它接受一個整數 n 和一個比較函數 cmp 作為參數。n 表示搜索范圍的大小，而 cmp 是一個用于比較元素的函數。函數返回兩個值：i（找到的元素的索引）和 found（一個布爾值，表示是否找到了元素）。
**比較函數 cmp**：這個函數接受一個整數索引作為輸入，并返回一個整數。返回值的意義是基于目標值 t 與索引 i 處元素的比較：如果 t 小于元素，則返回小于 0 的值；如果 t 等于元素，則返回 0；如果 t 大于元素，則返回大于 0 的值。
二分查找邏輯：

初始化兩個指針 i 和 j，分別指向搜索范圍的開始和結束。i 初始化為 0，j 初始化為 n。
循環執行，直到 i 不小于 j。在每次迭代中，計算中點 h，并使用 cmp 函數比較中點處的元素。
如果 cmp(h) 的結果大于 0，說明目標值 t 在中點的右側，因此將 i 更新為 h + 1。
如果 cmp(h) 的結果不大于 0，說明目標值 t 在中點或中點的左側，因此將 j 更新為 h。
這個過程不斷縮小搜索范圍，直到 i 和 j 相遇。

結果返回：當循環結束時，i 和 j 相等。這時，如果 i 小于 n 且 cmp(i) 等于 0，則說明找到了目標元素，函數返回 i 和 found 為 true。否則，表示沒有找到目標元素，函數返回 i（此時 i 表示目標值應該插入的位置，以保持序列的有序性）和 found 為 false。

Find uses binary search to find and return the smallest index i in [0, n) at which cmp(i) <= 0. If there is no such index i, Find returns i = n. The found result is true if i < n and cmp(i) == 0. Find calls cmp(i) only for i in the range [0, n).To permit binary search, Find requires that cmp(i) > 0 for a leading prefix of the range, cmp(i) == 0 in the middle, and cmp(i) < 0 for the final suffix of the range. (Each subrange could be empty.) The usual way to establish this condition is to interpret cmp(i) as a comparison of a desired target value t against entry i in an underlying indexed data structure x, returning <0, 0, and >0 when t < x[i], t == x[i], and t > x[i], respectively.

Find 使用二分查找來查找并返回 [0, n) 中 cmp(i) <= 0 的最小索引 i。如果不存在這樣的索引 i，Find 將返回 i = n。如果 i < n 且 cmp(i) == 0，則找到的結果為 true。Find 僅針對 [0, n) 范圍內的 i 調用 cmp(i)。

為了允許二分搜索，Find 要求 cmp(i) > 0 作為范圍的前導前綴，cmp(i) == 0 在中間，cmp(i) < 0 作為范圍的最后后綴。（每個子范圍可以為空。）建立此條件的常用方法是將 cmp(i) 解釋為所需目標值 t 與基礎索引數據結構 x 中的條目 i 的比較，返回 <0、0 和 > 當 t < x[i]、t == x[i] 和 t > x[i] 時，分別為 0。

圖片

二者實現非常相近,都有用二分搜索

Find的第二個入參,也是一個func,但要求這個func的返回值是int而不是bool.另外Find的返回值有兩個,第二個返回值是bool,代表沒有找到指定元素

sort.Find()[8] 看起來和sort.Search()很像,為什么有了Search還需要Find?二者有什么不同?

回溯一下Find的歷史, 提案sort: add Find #50340[9]是Go三位創始人之一的另一位Rob Pike[10]提出

從討論中[11]:

It sounds like we agree that sort.Search is hard to use, but there's not an obvious replacement. In particular, anything that can report an exact position has to take a different function (a 3-way result like strings.Compare, or multiple functions).

"聽起來我們都同意 sort.Search 很難使用，但沒有明顯的替代品。特別是，任何可以報告精確位置的東西都必須采用不同的函數（3 路結果，如 strings.Compare 或多個函數）"

slices: add Sort, SortStable, IsSorted, BinarySearch, and func variants #47619[12]

The difference between Search and Find in all cases would be that Find returns -1 when the element is not present,whereas Search returns the index in the slice where it would be inserted.

"在所有情況下，Search 和 Find 之間的區別在于，當元素不存在時，Find 返回 -1，而 Search 返回要插入元素的切片中的索引"

之所以Search大家感覺不夠好用,是因為如果要查找的元素沒在slice里面，Search會返回要插入元素的切片中的索引(并不一定是slice的長度,只有當元素比切片最后一個元素還大時,此時返回值才正好等于切片長度)，而絕對不是-1, 這樣就沒法判斷某個元素是否在切片中, 可看下面的例子:

package main

import (
 "fmt"
 "sort"
)

func main() {

 data := []int{1, 2, 3, 5, 10}

 target0 := 4
 index0 := sort.Search(len(data), func(i int) bool {
  return data[i] >= target0
 })
 fmt.Println(index0) //3

 target1 := 5
 index1 := sort.Search(len(data), func(i int) bool {
  return data[i] >= target1
 })
 fmt.Println(index1) //3

 target2 := 6
 index2 := sort.Search(len(data), func(i int) bool {
  return data[i] >= target2
 })
 fmt.Println(index2) //4

 target3 := 11
 index3 := sort.Search(len(data), func(i int) bool {
  return data[i] >= target3
 })
 fmt.Println(index3) //5

}

正如Rob Pike所說,他想用sort.Search來做搜索,判斷某個元素是否在(已排序的)切片中,但實際上,如target2的6, sort.Search()得到結果4, 是說如果把這個元素插入這個有序切片,需要插入在data[4]這個位置,并不一定是說data[4]=6

即通過sort.Search無法直接判斷某個元素是否在切片中,還需要補一個 data[index]和target是否相等的判斷

而Find則不然,

package main

import (
 "fmt"
 "sort"
 "strings"
)

func main() {

 x := []string{"a", "e", "h", "s", "v"}

 target := "e"
 i, found := sort.Find(len(x), func(i int) int {
  return strings.Compare(target, x[i])
 })
 if found {
  fmt.Printf("found %s at entry %d\n", target, i)
 } else {
  fmt.Printf("%s not found, would insert at %d\n", target, i)
 }

 target2 := "g"
 j, found2 := sort.Find(len(x), func(j int) int {
  return strings.Compare(target2, x[j])
 })
 if found2 {
  fmt.Printf("found %s at entry %d\n", target2, j)
 } else {
  fmt.Printf("%s not found, would insert at %d\n", target2, j)
 }

}

輸出:

found e at entry 1
g not found, would insert at 2

即不光返回了應該插入到哪個位置,還把目標元素是否在切片中也返回了~

一定程度上,Find似乎放在slices包更合適:

https://github.com/golang/go/issues/50340#issuecomment-1034071670

https://github.com/golang/go/issues/50340#issuecomment-1034341823

我同意將 sort 和 slices 包的接口解耦。前者適用于抽象索引，而后者適用于實際切片

sort.Find 和 slices.BinarySearchFunc

其實,我覺得很大程度,是因為slices.Contains性能不夠好(調用了slices.Index,即遍歷檢測), 而用sort.Find前提是一個已排序的切片,即可以用二分查找,肯定比用現在的slices.Contains暴力遍歷要好

但其實,Contains不需要知道index,只需要知道在不在,有沒有...可以考慮用map,雖然map的創建等需要一定的開銷,但是對于元素數量非常多的case,hash查找的O(1)的優勢就體現出來了~而且不需要切片的有序的

應該提一個提案,來優化現有的slices.Contains

Reference

[1]Robert Griesemer: https://github.com/griesemer

[2]sort.Search(): https://github.com/golang/go/blob/master/src/sort/search.go#L58

[3]在線運行: https://go.dev/play/p/6dlo5pwcNXy

[4]Go語言中sort.Search()的使用方法（數組中通過值來取索引）: https://blog.csdn.net/hty46565/article/details/109689515

[5]golang 中sort包sort.search()使用教程: https://studygolang.com/articles/14087

[6]首次提交: https://go-review.googlesource.com/c/go/+/396514

[7]2022年8月2號發布的Go 1.19: https://golang.google.cn/doc/devel/release

[8]sort.Find(): https://github.com/golang/go/blob/master/src/sort/search.go#L99

[9]sort: add Find #50340: https://github.com/golang/go/issues/50340

[10]Rob Pike: https://github.com/robpike

[11]討論中: https://github.com/golang/go/issues/50340#issuecomment-1016736447

[12]slices: add Sort, SortStable, IsSorted, BinarySearch, and func variants #47619: https://github.com/golang/go/issues/47619#issuecomment-915428658

責任編輯：武曉燕來源：旅途散記

Go Find 元素

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看