CSC148 Stuff by Cem Yildirim: 2014

Friday 28 March 2014

An Alternative Approach to Assignment 2

As we all know, second part of assignment 2 was a total pain to code and especially, debug. After implementing the all_regex_permutations part by simply permuting the given string and checking whether each permutation is valid or not, I decided to try a different approach to make it faster since this implementation fails to complete in a sensible amount of time after a string which is longer than 12 characters is given as input. The first idea in my mind was to construct a really basic regex tree out of the given string and then permute the tree itself to get every possible variation of the regex. So my first attempt looked like this:

def all_regex_permutations(s):
    '''
    Return a SET of valid regex permutations of s

    >>> all_regex_permutations('(1|0)')
    {'(1|0)','(0|1)'}
    >>> all_regex_permutations('((01.))|e')
    {'(1|(e.0))', '(0.(e|1))', '((1|e).0)', '(e.(0|1))', '(e|(0.1))', '((0|1).e)', '(0|(1.e))', '(1|(0.e))', '(e|(1.0))', '((1.e)|0)', '(0.(1|e))', '((e|1).0)', '(e.(1|0))', '((e|0).1)', '((1|0).e)', '(1.(0|e))', '((0|e).1)', '((0.1)|e)', '((e.1)|0)', '((0.e)|1)', '(0|(e.1))', '(1.(e|0))', '((e.0)|1)', '((1.0)|e)'}
    '''
    starcount = 0
    leaves = []

    def addperm(x, l):
        return [l[0:i] + [x] + l[i:] for i in range(len(l) + 1)]

    def perm(l):
        if len(l) == 0:
            return [[]]
        return [x for y in perm(l[1:]) for x in addperm(l[0], y)]

    def sampletree(_str):
        if((_str.count('(') + _str.count(')')) ==
           2 * (_str.count('.') + _str.count('|'))):
            tree = []
            for _char in _str:
                if(_char in ['0', '1', '2', 'e']):
                    leaves.append(_char)
                    tree.append(Leaf('e'))
            for i in range(_str.count('.')):
                tree.append(DotTree(tree.pop(), tree.pop()))
            for i in range(_str.count('|')):
                tree.append(BarTree(tree.pop(), tree.pop()))
            if len(tree) == 1:
                return tree[0]
            raise ValueError()
        raise ValueError()

    def tostring(r):
        if isinstance(r, Leaf):
            return r.get_symbol()
        if isinstance(r, StarTree):
            return tostring(r.get_children()[0]) + '*'
        if isinstance(r, DotTree) or isinstance(r, BarTree):
            return ('(' + tostring(r.get_children()[0]) +
                    r.get_symbol() + tostring(r.get_children()[1]) + ')')

    def all_tree_permutations(r):
        permlist = []
        if isinstance(r, Leaf):
            permlist.append(r)
        if isinstance(r, BarTree) or isinstance(r, DotTree):
            firstchild = r.get_children()[0]
            secondchild = r.get_children()[1]
            for _fprm in all_tree_permutations(firstchild):
                for _sprm in all_tree_permutations(secondchild):
                    if isinstance(r, BarTree):
                        permlist.append(BarTree(_fprm, _sprm))
                        permlist.append(BarTree(_sprm, _fprm))
                    else:
                        permlist.append(DotTree(_fprm, _sprm))
                        permlist.append(DotTree(_sprm, _fprm))
        return permlist

    def addstars(strlst):
        lst = []
        for _str in strlst:
            for index, _char in enumerate(_str):
                if(_char in ['0', '1', '2', 'e', '*', ')']):
                    lst.append(_str[:index + 1] + '*' + _str[index + 1:])
        return lst

    def permuteleaves(strlst):
        lst = []
        for _str in strlst:
            for permedleaves in perm(leaves):
                s = _str
                for index, _char in enumerate(_str):
                    if _char == 'e':
                        s = (s[:index] + permedleaves.pop() + s[index + 1:])
                lst.append(s)
        return lst

    while s.find('*') != -1:
        starcount += 1
        s = s[:s.find('*')] + s[s.find('*') + 1:]
    try:
        firstree = sampletree(s)
    except:
        return set()
    strlist = [tostring(x) for x in all_tree_permutations(firstree)]
    strlist = permuteleaves(strlist)
    while starcount > 0:
        strlist = addstars(strlist)
        starcount -= 1
    return set(strlist)

To go through the code, addperm and perm methods are there to be used to permute any list or string you throw at them, then we sampletree method which creates a regex tree disregarding every syntax rule you had to use at build_regex_tree and parsing through every string like an insatiable monster that devours any string as a meal and exudes a regex tree. I use sampletree to create my initial regex tree which I then use to permute. We then have the tostring method, it is a really simple methods that gives you a string representation of any regex tree you present. Then we come to the heart of the code, all_tree_permutations method takes a string and then supposedly returns every possible permutation of it by iterating through each permutation of its children. It does not always work as intended and it needs a bit of rethinking. I also have an addstars method which adds stars to every possible place you can put it on. I am using addstars method as it gets rid of loads of complexity on the algorithm if you tried to just parse unary trees, I simply modify the string representation with every possible permutation of stars. Permuteleaves function essentially combines every possible representation of leaves with every other representation of trees therefore completing the problem. When I first drafted the code, I did not feel a need to use a method to permute the leaves since if I could find every possible combination of tree permutations, I would be done with the problem. After a bit of coding and trying a myriad of ways, I simply could not find an easy way to permute trees altogether with their leaves. Permuteleaves method is there for sake of convenience, if you can find a better way to merge two parts, please contact!

Anyways after fiddling with tree permutations for a while, I could not come up with a 100% complete solution therefore I moved onto another approach. This way, the solution is complete and the code is also a bit easier to follow. The basic idea is to use a variation of the addstars method on the code above to add each operator iteratively. Here is the code:

def all_regex_permutations(s: str) -> set:
    '''
    Return a set of valid regex permutations of s

    >>> len(all_regex_permutations('(1|0)'))
    2
    >>> len(all_regex_permutations('(0.1*)'))
    6
    >>> len(all_regex_permutations('(0.1*)*'))
    12
    >>> len(all_regex_permutations('((01.))|e'))
    24
    '''
    permset = set([])
    starcount = s.count('*')
    numcharacters = len(s)
    operatorlist = (['.' for i in range(s.count('.'))] +
                    ['|' for i in range(s.count('|'))])

    for char in s:
        if not char in ['0', '1', '2', 'e', '(', ')', '.', '|', '*']:
            return permset

    def allperms(_str):
        '''Return a generator of all permutations of a string'''
        if len(_str) <= 1:
            yield _str
        else:
            for perm in allperms(_str[1:]):
                for i in range(len(perm) + 1):
                    yield perm[:i] + _str[0:1] + perm[i:]

    def addoperator(strlst, operator):
        lst = []
        for _str in strlst:
            for index in range(1, len(_str)):
                if (
                    _str[index - 1] in ['0', '1', '2', 'e', ')'] and
                    _str[index] in ['0', '1', '2', 'e', '(']
                ):
                    startindex = index - 1
                    endindex = index + 1
                    balance = 1
                    if _str[index - 1] == ')':
                        while _str[startindex] != '(' or balance != 1:
                            if _str[startindex] == ')':
                                balance += 1
                            if _str[startindex] == '(':
                                balance -= 1
                                if balance == 1:
                                    break
                            startindex -= 1
                    balance = 2
                    if _str[index] == '(':
                        while _str[endindex] != ')' or balance != 1:
                            if _str[endindex] == '(':
                                balance += 1
                            if _str[endindex] == ')':
                                balance -= 1
                                if balance == 1:
                                    break
                            endindex += 1
                    lst.append(
                        _str[:startindex] +
                        '(' +
                        _str[startindex:index] +
                        operator +
                        _str[index:endindex] +
                        ')' +
                        _str[endindex:]
                    )
        return lst

    def addstars(strlst):
        lst = []
        for _str in strlst:
            for index, _char in enumerate(_str):
                if(_char in ['0', '1', '2', 'e', '*', ')']):
                    lst.append(_str[:index + 1] + '*' + _str[index + 1:])
        return lst

    for _char in ['(', ')', '*', '.', '|']:
        while s.find(_char) != -1:
            s = s[:s.find(_char)] + s[s.find(_char) + 1:]

    leafperms = [i for i in allperms(s)]
    for oplist in allperms(operatorlist):
        lst = leafperms[:]
        for operator in oplist:
            lst = addoperator(lst, operator)
        for i in range(starcount):
            lst = addstars(lst)
        permset |= set(lst)

    for i in permset:
        if numcharacters != len(i):
            return set([])

    return permset

In this version I still have a helper method that gives you the permutation of any list or string you throw at it.(Such convenience!) Then we have the main method that does the manual labor. Addoperator function accepts a list of strings and an operator then gives a list of strings in which the given operator is added to every possible part of each string in the list. The way it does this is by checking any possible indexes to insert an operator, and then checking the balance of parentheses to insert its respective opening and closing parentheses.(This part was hell to debug!) Then we still have our addstars method which I described earlier and that's it for helpers. Firstly, code strips the string of all the characters that are not leaves and then permutes them into a list. We iterate through every operator and add these to our permutation of leaves and that's it! I personally like this way so much that I stuck with this code instead working on the previous way to compute permutations, feel free to derive anything that's working from the tree implementation and thanks for reading!

Saturday 22 March 2014

Complexity and O(n) Notation

This week we went through calculating the complexity of a given function and finding its BigOh notation. This is the standard approach to estimate the efficiency of an algorithm. Since computers are quite fast, the problem with efficiency matters only on really large number of items, therefore BigOh notation is used whenever we assume a great number of items are given to the algorithm. In calculus you can think of this as taking the limit of a function as our parameters, called n from now on, approaches infinity. We essentially analyze and compare the growth rate of the algorithms. A great example is actually the two implementations I did of calculating the fibonacci sequence on a previous post. For the normal recursive solution, a simple call to calculate fibonacci(5) was actually running the function 15 extra times.

As our n gets pretty big, we can see that our recursive fibonacci algorithm would grow exponentially. Its complexity is O(((1+sqrt(5))/2)^n) to be precise and actually that base is the golden ratio! Another implementation that I showed on my previous post was using memoization to store previously calculated sequences so our number of calls would be 8, we nearly cut it in half!

As you can see from the illustration above, the memoization method gets rid of the redundant part of the problem and we just have to calculate and store the left part of our call tree. Hope you guys like the illustration I drew! Since we just have to calculate the leftmost branch of the tree to compute the sequence, our complexity becomes only O(n)! Much better eh?

Monday 17 March 2014

Binary Search Trees

This week we analyzed a specific version of binary trees called the binary search trees. They are quite similar with the addition of a simple rule. Every left child of the root node must have a smaller value than root's value and every right child has a bigger value than root's value. So a simple binary search tree would look like this.

This special version of binary tree has some fascinating properties. For example, to find the smallest or the highest value, you just have to go to leftmost or the rightmost node without even comparing any value! It is also terrific at finding a given value in the tree, hence its name. To find a value, you just compare it to a node starting from the root and then recursively search through left subtree if given value is smaller than node or scour the right subtree if the value is greater!

Thursday 6 March 2014

Yet Another Comparison of Queue Implementations

This week we had a lab on which students had to create an implementation of queue with linked lists. As I've done this out of curiosity couple of weeks ago I decided to up the ante in lab! On this post i will compare three different implementations of queues with graphs and timers only using python code and a great plotting library for python called Matplotlib. As I mentioned the native deque class in python as a really fast way to implement queues, I decided to give it a shot! Here are my three Queue classes that I tested:

Queue using List:

class Queue:

    def __init__(self):
        self._items = []

    def enqueue(self, item):
        self._items.append(item)

    def dequeue(self):
        a = self._items[0]
        del self._items[0]
        return a

    def front(self):
        return self._items[0]

    def is_empty(self):
        return self._items == []

Queue using Linked List:

class Node:

    def __init__(self, data=None):
        self.data = data
        self.next = None

    def __str__(self):
        return str(self.data)


class Queue:

    def is_empty(self):
        if self.head and not self.curr:
            return False
        elif not self.curr:
            return True
        elif self.curr.next:
            return False
        else:
            return True

    def __init__(self):
        self.head = None
        self.curr = None
        self.tail = None

    def __iter__(self):
        return self

    def dequeue(self):
        return self.next().data

    def next(self):
        if self.head and not self.curr:
            self.curr = self.head
            return self.curr
        elif self.curr.next:
            self.curr = self.curr.next
            return self.curr
        else:
            raise StopIteration

    def enqueue(self, data):
        n = Node(data)
        if not self.head:
            self.head = n
            self.tail = n
        else:
            self.tail.next = n
            self.tail = self.tail.next

    def front(self):
        a = self.dequeue()
        self.enqueue(a)
        return a

Queue using Deque:

from collections import deque


class Queue:

    def __init__(self):
        self._deq = deque()

    def enqueue(self, item):
        self._deq.append(item)

    def dequeue(self):
        return self._deq.popleft()

    def front(self):
        a = self._deq.popleft()
        self._deq.append(a)
        return a

    def is_empty(self):
        try:
            self._deq.popleft()
        except IndexError:
            return True
        else:
            return False

Here is the code I used to time, test, compare and graph these three different implementations of Queues:

import time
import matplotlib.pyplot as plt
from csc148_queue import Queue as ListQueue
from csc148_queue_deque import Queue as DequeQueue
from csc148_queue_linked_list import Queue as LinkedListQueue
factor = 10000


def getduple(q, iternum):
    a = range(factor, factor * iternum, factor)
    b = []
    for i in a:
        b.append(time_queue(i, q))
    return a, b


def enqueue_dequeue(q, howmany):
    """Enqueue and dequeue 'howmany' items to/from Queue 'q'.
    """

    for i in range(howmany):
        q.enqueue(42)
        q.dequeue()


def time_queue(n, q):
    """Return how long it takes to enqueue and dequeue 'm' items to/from a
    Queue with 'n' items already in it.
    """

    for i in range(n):
        q.enqueue(1)

    start = time.time()
    enqueue_dequeue(q, 20000)
    end = time.time()

    return end - start
if __name__ == '__main__':
    firstqueue = getduple(ListQueue(), 12)
    secondqueue = getduple(LinkedListQueue(), 12)
    thirdqueue = getduple(DequeQueue(), 12)
    p1, = plt.plot(firstqueue[0], firstqueue[1], color='r', linewidth=2.0)
    p2, = plt.plot(secondqueue[0], secondqueue[1], color='b', linewidth=2.0)
    p3, = plt.plot(thirdqueue[0], thirdqueue[1], color='g', linewidth=2.0)
    plt.legend(
        [p1, p2, p3], ["List", "LinkedList", "Deque"], loc='upper left')
    plt.ylabel('Time')
    plt.xlabel('Number of Items')
    plt.show()

After running the code above you get a nice little graph of the time required for these queues against the number of items that are in the queue. Here is a little screenshot with a really high factor of 10000.

Here is a bit more zoomed in version on the difference between Deque and LinkedList:

As we can see, even though the Deque class outperforms LinkedList, the difference in their times is quite small even for relatively big number of items. Their difference in timings is smaller than 0.2 seconds for numbers as large as 110000 items. That's quite impressive!

Saturday 15 February 2014

Trees

This we started working on trees as our new data structure. The main advantage of trees is whenever you want to represent a system or data with a hierarchy, trees are your main tool to use. To give an example, mathematical expressions are one of the best ways to implement trees with. As each operator we use is comprised of a symbol and with left and right parts, we can use binary trees to represent them while using leaves to symbolize basic numbers. On this example, hierarchy is used to represent the priorities associated with the operations since (2+1)*3 is completely different than 2+(1*3). We also learned to traverse through trees with recursion since iterating through trees despite being doable, is much more lengthy to do.

Here are some SLOGS that are worth checking out:
http://fatmothinmycode.wordpress.com/
http://donthate148.blogspot.ca/
http://zccompsci.wordpress.com/
http://berkeycolortran.wordpress.com/

Wednesday 5 February 2014

Recursion and Fibonacci

This week was one of the slowest classes as it was more like a resting period to help students digest recursion. We continued recursion in lectures and Dan showed us how to break problems into smaller parts and simplify so we can solve them using recursion. We also had a lab involving recursions on which we solved some seemingly basic problems. It was a breeze to write the code with recursion otherwise we had to iterate through all the possibilities by using a variant of a for loop. The thing with recursion is, most problems can be solved without it but using recursions is an elegant and artistic way to solve problems as it saves code and coder's time despite being ever so slightly inefficient at times.

To give an example of inefficiency on recursion let's examine finding the fibonacci sequence given the index. You would normally solve this using recursion by having a base case and then modifying our call to the method.

def fibonacci(index):
    if index <= 1:
        return 1
    return fibonacci(index-1) + fibonacci(index-2)

Although the recursive implementation above is elegant, it is not very practical nor efficient. Calculating fibonacci(index) requires calculating two smaller methods, which in turn require two additional recursive calls each, and so on until all branches reach index=1. Hence, the time required to calculate fibonacci(index) is exponential.

Another way to do it while still using recursion is by a technique called memoization in which, instead of calling each smaller fibonacci method we already calculated, we store them so we can just use the already calculated data instead of going through all the recursion again. This is easily done in python using dictionaries.

def fibonacci(index):
    stored = {0: 1, 1: 1}
    if not (index in stored):
        stored[index] = fibonacci(index - 1) + fibonacci(index - 2)
    return stored[index]

In our dictionary we hold the initial values and whenever we need to find a value that is not yet in our dictionary we simply calculate it and add it to our dictionary therefore eliminating the need to ever calculate a value more than once.

Tuesday 28 January 2014

Queue Implementation with Linked Lists

As requested, here is my implementation of the Queue class with Linked Lists in Python. Although Python is quite a high level language to utilize a data type as simple as a Linked Lists, it still greatly improves the performance of a Queue class which uses lists. I used timequeue.py given in lab 2 to benchmark and compare the both implementations and here are the results:

-------Linked List Queue--------------------------------
Running time on 10000 items: 0.07587790489196777
Running time on 20000 items: 0.06835103034973145
Running time on 30000 items: 0.07256507873535156
Running time on 40000 items: 0.07469511032104492
Running time on 50000 items: 0.07767081260681152
Running time on 60000 items: 0.09958291053771973
Running time on 70000 items: 0.1073610782623291
Running time on 80000 items: 0.11011409759521484
Running time on 90000 items: 0.07159709930419922
Running time on 100000 items: 0.11853194236755371
----------------- List Queue--------------------------------
Running time on 10000 items: 0.3262810707092285
Running time on 20000 items: 0.6386370658874512
Running time on 30000 items: 0.948969841003418
Running time on 40000 items: 1.2549891471862793
Running time on 50000 items: 1.5696308612823486
Running time on 60000 items: 1.8530290126800537
Running time on 70000 items: 4.520461082458496
Running time on 80000 items: 5.268892049789429
Running time on 90000 items: 5.846363067626953
Running time on 100000 items: 6.589374780654907

Here is my code for the two cases:

Linked List Queue:

class Node:

    def __init__(self, data=None):
        self.data = data
        self.next = None

    def __str__(self):
        return str(self.data)


class Queue:

    def is_empty(self):
        if self.head and not self.curr:
            return False
        elif not self.curr:
            return True
        elif self.curr.next:
            return False
        else:
            return True

    def __init__(self):
        self.head = None
        self.curr = None
        self.tail = None

    def __iter__(self):
        return self

    def dequeue(self):
        return self.next().data

    def next(self):
        if self.head and not self.curr:
            self.curr = self.head
            return self.curr
        elif self.curr.next:
            self.curr = self.curr.next
            return self.curr
        else:
            raise StopIteration

    def enqueue(self, data):
        n = Node(data)
        if not self.head:
            self.head = n
            self.tail = n
        else:
            self.tail.next = n
            self.tail = self.tail.next

    def front(self):
        a = self.dequeue()
        self.enqueue(a)
        return a

List Queue:

class Queue:

    def __init__(self):
        self._items = []

    def enqueue(self, item):
        self._items.append(item)

    def dequeue(self):
        a = self._items[0]
        del self._items[0]
        return a

    def front(self):
        return self._items[0]

    def is_empty(self):
        return self._items == []

Actually after a bit of research, the best way to use a Queue class would be to take advantage of the deque objects in collections package. I may implement and benchmark it if I ever find the time. For more info: http://docs.python.org/2/library/collections.html#collections.deque

Friday 24 January 2014

Except This! (Week 3)

Exceptions are the bane of every student's programming life. Until this week we only knew just a handful of them which would usually come up every time I ran a code (IndentationError, SyntaxError being the most common ones). Now, we can even create them! By learning how to create them we also started delving into the concepts of inheritance and OOP. This is when the fun begins!

I also would like to state how I implemented the Stack and Queue classes for the lab assignment. I basically used a binary tree consisting of duples as nodes for the Stack class and used linked list to implement the Queue class which resulted in better performance from the code. I also used an integer to solve the balanced parenthesis problem we discussed in class which was more efficient in space usage than a stack.

If you want more information about the implementations I would gladly send you the code! Til next time!

Yo check this sudoku solver written in python! http://norvig.com/sudoku.html

Friday 17 January 2014

Initial Post (Week 2)

This week we wrapped up lectures on queues and stacks. These data types are quite imperative in computer science discipline and their usage can be found on the simplest of programs, such as executing any recursive function. We implemented these types in Python using an easy but inefficient method that utilizes lists.

We also tried to write a method that found the size of a stack by only using the stack's methods. Many people found unique solutions which left us in wonder. This showed us the ubiquitous ways a programmer may approach a problem yet each solutions efficiency would still be in question.

In our first lab we tried to program a word counter that can also show us some statistics about our usage of certain words. Our TA, in his example, implemented it by using dictionaries to hold words and their frequencies. However I wanted to try a different approach so I used a duple list to store words and their frequencies. This approach had some problems that weren't encountered in the dictionary approach. First of all, as duples are immutable, when you encounter the same word again in the same content you just can't update the frequency in the duple. So I sorted my initial list of words before parsing to find frequency for each word by easily iterating over the list until I came up on a new word. Although, I'm not sure about the efficiency of my algorithm it is nice to come up with different approaches every now and then.

Til next time, take care!