Monday, March 16, 2015

Twitter bot python implementation. birdy twitter api


Here is a little python implementation of twitter bot using awesome birdy api made for python.
I really like this api, it's simple flexible and powerful, just what you need to build simple twitter bot at your convenience.

I'll not go into detail much about the code, but here is what it does, twitter bot, more like proof of concept parses specific page (here I parse links of latest news at politika site, and post them as a link to twitter.

It's simple, class urlHTMLParser parses the page, when it finds a tag, it parses the page and collects all links related to latest news (topics and links) then using birdy twitter api we simply tweet about it in intervals we defined bellow.

Before testing the code, you have to obtain CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
you can obtain this information here.
Note: most of the code is for parsing as you'll see, if you want to tweet some predefined tweet, that would look like so: 

        try:

            tweet = 'this is my simple tweet'

            # sending tweet ...
            response = client.api['statuses/update'].post(status = tweet)

            print 'Tweet has been send.'
        except TwitterApiError as e:
            print e
And here is the full code, doing some parsing and then tweeting about it, check it out.

#!/usr/bin/python
from birdy.twitter import UserClient
from birdy.twitter import TwitterApiError
from HTMLParser import HTMLParser
import requests
import re
import time

class urlHTMLParser(HTMLParser):
    links = []
    def handle_starttag(self, tag, attrs):
        if tag == 'a':
            dic = dict(attrs)
            if(dic.has_key('href') and dic.has_key('title')):
                if(re.search(r'najnovije-vesti', dic['href'], re.IGNORECASE) != None):
                    self.links.append({ 'href' : dic['href'], 'title' : dic['title'] })
                
            return
        else:
            return
    def handle_endtag(self, tag):
        return
    def handle_data(self, data):
        return
    def getLinks(self):
        return self.links

parser = urlHTMLParser()

client = UserClient(CONSUMER_KEY,
                    CONSUMER_SECRET,
                    ACCESS_TOKEN,
                    ACCESS_TOKEN_SECRET)

#response = client.api['users/show'].get(screen_name ='rdjordje')
#print response.data['description']
#print response.resource_url
#print response.headers


# politika request page - najnovije vesti 
# for proxy catch exception
try:
    req = requests.get('http://www.politika.rs/vesti/najnovije-vesti/index.1.sr.html')
except ProxyException as e:
    print e


if req.status_code == 200:
    print 'Status code: [OK]'
    print 'Parsing source page ...'
    parser.feed(req.text)
    parser.close()
    print 'Parsing complete, total latest news found: [', len(parser.getLinks()), ']'
    for l in parser.getLinks():
        try:

            print 'Sending tweet...'
            print 'Tweet : '
            tweet = l['title'] + ' - ' + l['href'] + ' #kratkinjuz'
            print tweet

            # sending tweet ...
            response = client.api['statuses/update'].post(status = tweet)

            print 'Tweet has been send.'
        except TwitterApiError as e:
            print e
        
        # sleep for 1 minute[s]
        print 'Waiting for 5 minute[s]...'
        time.sleep(60 * 5)
        


Ok, this is the full code doing something useful. (relatively) If you modify a code little bit, you can parse any relevant pages you want and post some content on twitter.

Nice thing would be to run crontab periodically instead of time.sleep.

Also nice idea would be to write a script that will check last activity for all your followers and take action in accordingly.  

Eratosthenes sieve - generating primes in a range





When we want to find all prime numbers up to some integer N algorithm called Eratosthenes sieve comes into play. Why? Well for one it's super fast. It runs in time O(n*log(log(n))) and O(n) space. But before I explain the algorithm and show some implementation in c/c++ I would like to discuss some simple observations in generating prime numbers.

By definition 2 is a prime.
Number k is a prime if k is not divisible by any number (but 1) up to k-1 or gcd(k, xi)=1 where xi is in range [2,k-1] .

So in order to check weather some number k is prime, we need to make sure that all numbers up to and including k-1 do not divide k.

case 1: -- (1)

Bool isPrime(k){ if(k==1) return false;
For(i=2; i < k; i++)
  If(k % i == 0)
    Return false;
Return true;
}



For example k = 11
i = 2:  11 mod 2 = 1 but 1 != 0
i = 3:  11 mod 3 = 2 but 2 != 0
.
.
.
i = 10: 11 mod 10 != 0, since this is last i in our loop, our function returns true. Which is correct. 11 is a prime number.

Let's try for some number which is not prime, let try for 4.
k = 4
i = 2: 4 mod 2 = 0, and function returns false, so for is not prime.
Let's optimize our function notice the following:

For k = 18, let'.s factor it.
18 = 2 * 9
18 = 3 * 6
18 = 6 * 3
18 = 9 * 2

Notice that after some point by commutation of product, pattern repeats. Let's check after what number as a function of k we can stop checking and be sure we've checked enough to claim no more checking is needed.

For k = 18, i has to go up to and including 3, which seems to be floor(sqrt(k)) - 1 but let's see what happens when k is square of prime number.

For k = 49
49 = 7*7, 7 is a prime, so iterator i has to be 7 in order to conclude that 49 is composite.

Now we see that in cases when k is perfect square our iterator i has to go up to and including sqrt(k).

So range we have to check is while i<=sqrt(k).

This is nice improvement for identifying whether some k is prime.
Our loop in conclusion becomes:

for(i=2; i <= sqrt(k); i++)              -- (2)
.
.
.

Of course we can optimize further, none of the even numbers can be prime, so we don't need to check them at all.
So our loop can start at 3 and go up to sqrt(k) in step of 2. Like this:

for(i=3; i <= sqrt(k); i+=2)               -- (3)

Our implementation of function for primality test can be optimized even further by the fact that any prime number can be of form (6k+1) k = -1,1,2,3,4 this optimization would optimize primality test by a factor of 3. In any case purpose of this text is to take another approach that is sieving primes up to some range, so now will look into idea of Eratosthens sieve.

However before we do that, we should note something about time complexity of stated functions.

Let's analyse time complexity:

Note that analysis of time complexity when dealing with prime numbers needs to take into account size of the input. Analysis we present is dependent of sole value of k or n, that is assumption is that modulo, division operations can be done in O(1) time which is not true for very large numbers. 

This should be taken with reserve. Since assumption in not true for when dealing with big integers! 

When checking if some number k is prime in first case we need to check from 2 to k-1 so time depends how large k is. And it depends linearly of k so time complexity is O(k) for single k.

In second case we check up to and including sqrt(k) so in big O notation time complexity is O(sqrt(k)).

What happens to time complexity if we want generate all primes up to some n?

Well let's see, for every positive integer k<=n we need to apply one of our functions. (Naive or improved)
So in first case with naive function, we have to call function isPrime n-2 times. 
Let's see the pattern here for example if n = 9:

isPrime(2) + isPrime(3) + ... + isPrime(8) + isPrime(9) generally we have [sum(n-k)] for k = 0 : n-2

[ (n + n-1 + n - 2 ... + n-n+2) ] asymptotic value of this sum is practically sum of first n natural numbers which can be evaluated to n(n+1)/ 2. so complete time complexity is of rang O(n^2). Which is far too slow.

if we apply discussed improved function which checks only sqrt(k) numbers, time complexity is better let's see how much: 

we have n-1 times to call improved function for some value k <= n. Our complexity is dependent of n and k in following way:

[ sum(sqrt(n-k)) ], k = 0 : n-2, so we have (n-1)[ (sqrt(n) + sqrt(n-1) + ... + sqrt(n-n+2) ]  

time complexity of this version is a little better, in order to estimate this we need to know how to expand this sequence of roots. For that purpose check out this link: math.stackexchange.com It would be something of a rang O(n^1.5) 

In case 3, time complexity is faster by factor ln(n). It becomes O(n^1.5/ln(n))

In order to improve time complexity we need another approach. Instead of for every k<=n checking whether it is prime and printing it, we will assume all of k up to n are prime and then discard all composite. That will leave us with all of primes up to n. This is central idea of Eratosthenes sieve algorithm.

Let's go back to definition:

2 is a prime. So by thinking, ok if 2 is prime then none of it's multiples are not prime.
So 4, 6, 8, 10, 12 ... are not so we discard them.

By same logic we come to next number 3 and discard it's multiples.
So 6, 9, 12, 15 ... Not prime so we discard them too.

Notice that as we go along every next k which is not discarded must be prime.


For example when we discard multiples of 2 and 3, next number which is not discarded is 5 because, 4 is discarded as a multiple of 2.

In this fashion we discard all multiples up to n and we're left with primes up to n.  

up to what k we have to sieve, well we know that for upper bound n we need to check sqrt(n) values and that is precisely how much we need to sieve to be sure, we've discarded all composite numbers up to n. 

The algorithm follows:

primes is array which will contain our primes. 
void primeSieve(int *primes, int n) {
bool *sieve = (bool*)malloc(n+1);

// assume 2 .. n are primes 
for(int i = 2; i <= n; i++)
  sieve[i] = true;

sieve[0] = false;
sieve[1] = false;

int count = 0;
for(i = 2; i <=sqrt(n); i++) {
   if(sieve[i] == false)
     continue;

// put primes in array we've passed as parameter of the function.
primes[count++] = i;

  for(int j = i*i; j <= n; j+=i)
     sieve[j] = false;
}

}


Running time of this algorithm is O(n*log(log(n))) but regarding input size it's complexity is exponential. 
Space complexity is O(n).

Finally when sieving primes, memory turns out to be very important factor for large n and so we have something called segmented sieve version. 

General idea is to sieve primes in segments as name suggest. 

[101  102 103 104 105 106 107 108 109 110] -- segment to be sieved. 

for each prime (precalculated) find first multiple such that lower end of the range is > then that multiple.

m = 101 , n = 110

for prime  = 2, 101 - 101 mod 2 = 100
now we discard all multiples of 2 starting from 100 up to 110 -- (100,110] 

we cross off 100, 102, .. ,110

for prime = 3, 101 - 101 mod 3  = 99

we cross off 102, 105, 108

and so on ... until we check all previous primes.

To put your skills at practice regarding segmented version of sieve, check out prime1 problem at spoj.

Hope that clears up some things for you, happy programming.   



Thursday, March 5, 2015

implementation of binary search tree in python

Simply put, a binary tree is a data structure in which each node has at most two children.

Binary search tree on the other hand has some specific properties which enable this data structure to be efficient executing operations like insert, delete, lookup ... 

Generally our binary search tree implementation satisfies following:

  • One node is designated the root of the tree.
  • Each internal node contains a key and has two subtrees.
  • The leaves (final nodes) of the tree contain no key. Leaves are commonly represented by a special leaf or nil symbol, a NULL pointer, etc.
  • Each subtree is itself a binary search tree.
  • The left subtree of a node contains only nodes with keys strictly less than the node's key.
  • The right subtree of a node contains only nodes with keys strictly greater than the node's key.

Our implementation is regular search tree. It doesn't guarantee efficiency of order O(log(n)) for worst case scenario like AVL trees (balanced trees).

Little about efficiency:


Insertion - worst case time complexity is of order O(n) -> when tree deforms to list.
Example would be if all elements in ascending or descending order.

| 1 2 3 4 12 525 535 ... OR 82 72 61 2 1 ...

However, in average case, time complexity of insertion would be of order O(log(n)) which is very good.

Same applies to delete operation and lookup operation.

Note: Actually for the sake of precision, search (lookup) operation is of order O(log(n)) -- in average case. Insertion in itself requires us to find a position for that element (search procedure) and the insert desired element, same for deletion. Inserting a node after we've found it's place is of order O(1) -- constant time. 


To implement this data structure in python we will define two classes, class Node will represent our nodes and information every node contains and other class Tree which will be class implementing our binary tree. We use classes since python doesn't provide structs like c.

Implementation is recursive as it is natural for binary search tree by definition.

Node class consists of attributes l, r, v -- which represents "pointers to" left and right child and v -- representing value or key our node contains.


class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val


  Now we define class Tree like so:

class Tree:
    def __init__(self):
        self.root = None

    def getRoot(self):
        return self.root

    def add(self, val):
        if(self.root == None):
            self.root = Node(val)
        else:
            self._add(val, self.root)

    def _add(self, val, node):
        if(val < node.v):
            if(node.l != None):
                self._add(val, node.l)
            else:
                node.l = Node(val)
        else:
            if(node.r != None):
                self._add(val, node.r)
            else:
                node.r = Node(val)

    def find(self, val):
        if(self.root != None):
            return self._find(val, self.root)
        else:
            return None

    def _find(self, val, node):
        if(val == node.v):
            return node
        elif(val < node.v and node.l != None):
            self._find(val, node.l)
        elif(val > node.v and node.r != None):
            self._find(val, node.r)

    def deleteTree(self):
        # garbage collector will do this for us. 
        self.root = None

    def printTree(self):
        if(self.root != None):
            self._printTree(self.root)

    def _printTree(self, node):
        if(node != None):
            self._printTree(node.l)
            print str(node.v) + ' '
            self._printTree(node.r)
class='hljs python'

method printTree is for testing, in order to test this code we make an instance of our tree class.
Here it is...


tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print (tree.find(3)).v
print tree.find(10)
tree.deleteTree()
tree.printTree()


Notice that following insertion order our tree should look like this:
     3
    / \
   0   4
    \   \
    2    8

In printTree function we've implemented lRr -- meaning left child, root, right child traversal. 
For practice you can add other tree traversals and implement delete specific node function.

So the full code ready for testing:


#!/usr/bin/python

class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val

class Tree:
    def __init__(self):
        self.root = None

    def getRoot(self):
        return self.root

    def add(self, val):
        if(self.root == None):
            self.root = Node(val)
        else:
            self._add(val, self.root)

    def _add(self, val, node):
        if(val < node.v):
            if(node.l != None):
                self._add(val, node.l)
            else:
                node.l = Node(val)
        else:
            if(node.r != None):
                self._add(val, node.r)
            else:
                node.r = Node(val)

    def find(self, val):
        if(self.root != None):
            return self._find(val, self.root)
        else:
            return None

    def _find(self, val, node):
        if(val == node.v):
            return node
        elif(val < node.v and node.l != None):
            self._find(val, node.l)
        elif(val > node.v and node.r != None):
            self._find(val, node.r)

    def deleteTree(self):
        # garbage collector will do this for us. 
        self.root = None

    def printTree(self):
        if(self.root != None):
            self._printTree(self.root)

    def _printTree(self, node):
        if(node != None):
            self._printTree(node.l)
            print str(node.v) + ' '
            self._printTree(node.r)


tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print (tree.find(3)).v
print tree.find(10)
tree.deleteTree()
tree.printTree()


Hope that helps if you had some problems implementing binary search tree in python!