Substring matching

Post your nifty GP projects here!

Moderator: MSandro

Post Reply
mguzdial
Posts: 70
Joined: Sep 15th, '15, 11:21

Substring matching

Post by mguzdial » Jan 27th, '16, 23:10

My colleagues at Spelman wanted to see that we could do more traditional algorithms in GP, so I coded the below as an example:

Code: Select all

Get values for n and m, the size of the text and the pattern, respectively.
Get values for both the text T1T2…Tn and pattern matching P1P2…Pn.
Set k, the starting location for the attempted match, to 1.
While (k ≤ (n – m + 1)) do
            Set the value of i to 1.
            Set the value of MISMATCH to NO.
            While both (i ≤ m) and (MISMATCH = NO) do
                        If Pi ≠ Tk + (i-1)  then
                                    Set MISMATCH to YES
                        Else
                                    Increment i by 1 (to move to next character)
            End of loop
            If MISMATCH = NO then
                        Print the message “There is a match at position”
                        Print the value of k
            Increment k by 1
End of the loop
Stop
Attachments
patternmatch.gpp
(3.98 KiB) Downloaded 318 times

JohnM
Posts: 379
Joined: Sep 11th, '15, 14:42

Re: Substring matching

Post by JohnM » Feb 10th, '16, 20:18

Nice little project. I created a remix that is a less literal transcription of the pseudocode but, to me, reads a little more nicely. The two main changes were using "for" loop for the outer loop and using boolean values in the "match" variable.

I used your short lists of numbers to make sure it worked, then switched to searching for a word ("YOU?") in text of "The Cat In the Hat" by Theodor Geisel. That word is the final word in the text, so the match occurs at index 7424. You can search for other words that might or might not appear in the text: cat, python, sally, ship, day, mark. (Note: The text is all lowercase except for a few emphasized words like the final one.)

To switch from numbers to text merely required replacing the two "list" blocks with "letters" blocks on strings.

The teachers could encourage students to customize this by searching in some text of their own (e.g. one of the recipes they are using for the class). A possible extension of this project would be to collect a list of all the indices where a given word or phrase appears. For example the word "cat" appears 26 times in The Cat In the Hat and the word "and" appears 74 times.
patternmatchV2.gpp
(6.82 KiB) Downloaded 308 times

mguzdial
Posts: 70
Joined: Sep 15th, '15, 11:21

Re: Substring matching

Post by mguzdial » Feb 27th, '16, 15:27

Thanks, John! I've shared your idea (and project) with Yolanda and Jakita at Spelman.

Post Reply