Search Algorithms and Efficiency

Please note, this is a adaptation of the student-facing pages and was designed for use in a brief conference session. The complete student-facing version is available on bjc.edc.org: Unit 5 Lab 1 Page 2: How Many Five-Letter Words Are There?, Unit 5 Lab 1 Page 3: Is "Seperate" Spelled Correctly?, and Unit 5 Lab 1 Page 4: Exactly How Much Faster Is Binary Search?.

On this page, you will compare the time required for binary search and for linear search.

Understanding Linear Search: How Many Five-Letter Words Are There? (about 5 minutes)

Students will have built a does () have () letters? block and used it with keep to find words of a specific length in a word list. The question here is: How long does a process like that take?

  1. Saving is optional for this workshop session.
    Click here to load this file. Then save it to your Snap! account.
    This project provides the variables 1,000 words, 10,000 words, and 100,000 words (each containing a list of words) and a copy of the does () have () letters? predicate.
  2. How many five-letter words are in the 10,000 words list?
  3. Click for some hints.

    The blank input slot in the predicate is required; this is where each item from the list goes as it is checked by the predicate.

    • You will need the keep block. Keep filters an input list according to the results of a predicate expression (a true/false question). keep (does ( ) have (5) letters?) from (10,000 words)
    • You also need the red length of block that finds the number of items in a list; this is different from the green length of block that finds the number of letters in a text string.
  4. Use the computation time of 'grey ring input slot' block provided in this project to determine how long it takes for the computer to answer that question.

The computation time block takes any reporter (with its inputs filled in), computes the result but ignores it, and instead reports how long it took to do the computation (in milliseconds).
computation time of (numbers from (1) to (1000)) reporting 27

In this example, it took 27 milliseconds to compute the list of integers from 1 to 1000. (The report you see will depend on how fast your computer is and what other programs are running on it.)

You can look inside computation time to see how it works: Right-click (or control-click on a mac) the block and select "edit..." from the menu that appears.

  1. Click on the computation time expression you just used to answer the previous question three more times and note the range of answers. They're not exactly the same because of other things that your computer is doing at the same time, so you should always take whatever result it gives you as approximate.
  2. Talk with Your Partner How long do you think it would take to count five-letter words if the list had 100,000 words?
  3. Use computation time to check using the 100,000 words list.

The only way to answer a "How many words..." problem is to check every single word in the dictionary. So if you have ten times as many words in the dictionary, it takes ten times as long to check them all.

: Linear Search or Sequential Search

Understanding Binary Search: Is "Seperate" Spelled Correctly? (about 5 minutes)

: Binary Search

A binary search algorithm starts in the middle of a sorted list and repeatedly eliminates half the list until either the desired value is found or all elements have been eliminated.

Linear search does a complete traversal of the list. Binary search saves time by doing a partial traversal of the list.

Checking spelling is simple in Snap! because you can just ask a dictionary: 100000-words-contains-seperate.png. But "simple" doesn't mean fast. The contains block still has to look at each item in the list until it finds the one you are looking for (and says true) or reaches the end of the list (and says false). It is still a linear search.

When you are only looking for one thing in a list (for example, checking whether a particular word is spelled correctly) rather than finding all the words with some characteristic (for example, looking for all five-letter words), the best strategy is a binary search algorithm: always guess the middle value and then adjust your guess based on whether it was too high or too low. That way, you eliminate half the possibilities with each guess. We can use a similar strategy to look for a word in a word list.

  1. Check whether "seperate" is spelled correctly by using binary search for () in 'list input slot' to look for the word in the sorted list 100,000 words (sorted).
  2. You can look inside the binary search block to see how it works.
  3. Try binary search with some words that you know are spelled correctly and some that you know are incorrect.
  4. Now use binary search to search for the same words in the unsorted list 100,000 words.
  5. AAP-2.P.2
  6. Talk with Your Partner Why does binary searching work on one list but not the other? Consider how the binary search algorithm works.

The contains block can't make any assumptions about the ordering in list you are searching, but if you are looking for one thing in a sorted, list you can use binary search.

AAP-2.P.2

Two things affect whether you can use a binary search algorithm to make your program more efficient:

  find one value find many values
sorted data can use binary search must use linear search
unsorted data must use linear search must use linear search

If you are working with short lists, it's not so important which algorithm you use. It's when you are dealing with a lot of data that you have to think carefully about the algorithm.

Comparing Algorithms: Exactly How Much Faster Is Binary Search? (about 10 minutes)

  1. AAP-2.P.3
    Locate the linear search for () in 'list input slot' block included in your project, and look inside it. Compare it to the binary search algorithm you used to count the number of five- or seven-letter words (starting in the middle of a sorted list and repeatedly eliminating half the list). The linear search block does the same computation as the binary search block, but it uses the linear search algorithm.
  2. Use computation time of 'grey ring input slot' to test how much time linear search takes to find the word "zebra" in each length list.
    Length of List Linear Search Time
    1000
    10,000
    100,000
    You can save a copy of this chart if you want a spreadsheet to fill out. Just start with the first empty column, "Linear Search Time," for now.
  3. Talk with Your Partner Look at the table. How would you describe what happens to the time as the input gets bigger?

The actual amount of physical time that it takes to solve a problem depends not only on your algorithm but also on how fast your computer is and what other programs you have running, etc. Therefore, computer scientists who want to measure the speed of an algorithm do it in terms of the number of steps. For example, what we really want to know about the efficiency of the linear search algorithm is how many times current item is compared to value (that is, how many times (current item) = (value) is called).

  1. AAP-2.P.3
    Add another column to your table. Assuming "zebra" is the last word in each word list, how many comparisons are made by the linear search algorithm for each length list?
    Length of List Linear Search Time Linear Search Steps
    1000
    10,000
    100,000
  2. Talk with Your Partner How would you describe what happens to the number of steps as the input list gets bigger? Write your hypothesis about the pattern.
  3. Does what happens with steps match what happens with time? That is, can you count steps as a measure of time?
AAP-4.A.3

The relationship between the input size and the number of steps required to solve a problem is the efficiency of the algorithm used to solve the problem.

AAP-4.A.4, AAP-4.A.5

Counting the number of steps, as you just did, is an informal, but perfectly good way to determine the efficiency of an algorithm. The formal measurement of an algorithm requires a mathematical proof.

In this course, you are mostly working with small problems, so it doesn't matter how efficient the algorithm is. But in the real world, programmers deal with lists of billions of items, so the efficiency of an algorithm can make a huge difference.
  1. AAP-2.P part a
    AAP-2.P.3
    Now, test how much time binary search takes to find the word "zebra" in the sorted lists, and determine how many comparisons are made by the algorithm if "zebra" is the last word in each length list.
    Length of List Binary Search Time Binary Search Steps
    1000
    10,000
    100,000
  2. Talk with Your Partner Describe what happens to the time and the number of steps as the input list gets bigger. Write down your hypothesis.
  3. AAP-2.P.3, AAP-4.A.6
    Look back at your tables for the linear search and the binary search algorithm, and compare the two search algorithms:
    1. Which has more blocks in its code?
    2. Which runs faster for large inputs?
    3. Which algorithm is more efficient?
  1. Wondering which search algorithm to use? Consider: are you searching for one thing or for all the things matching a characteristic?
    Suppose instead of counting five-letter words, we count the seven-letter words. Will this take... Write Out Your Thoughts
    1. More time because seven letters is more than five letters?
    2. Less time because there are fewer seven-letter words than five-letter words?
    3. The same time because it's not the word length that matters; it's the size of the dictionary?
  2. Experiment to find out for sure.