Monday, March 16, 2015

Twitter bot python implementation. birdy twitter api


Here is a little python implementation of twitter bot using awesome birdy api made for python.
I really like this api, it's simple flexible and powerful, just what you need to build simple twitter bot at your convenience.

I'll not go into detail much about the code, but here is what it does, twitter bot, more like proof of concept parses specific page (here I parse links of latest news at politika site, and post them as a link to twitter.

It's simple, class urlHTMLParser parses the page, when it finds a tag, it parses the page and collects all links related to latest news (topics and links) then using birdy twitter api we simply tweet about it in intervals we defined bellow.

Before testing the code, you have to obtain CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
you can obtain this information here.
Note: most of the code is for parsing as you'll see, if you want to tweet some predefined tweet, that would look like so: 

        try:

            tweet = 'this is my simple tweet'

            # sending tweet ...
            response = client.api['statuses/update'].post(status = tweet)

            print 'Tweet has been send.'
        except TwitterApiError as e:
            print e
And here is the full code, doing some parsing and then tweeting about it, check it out.

#!/usr/bin/python
from birdy.twitter import UserClient
from birdy.twitter import TwitterApiError
from HTMLParser import HTMLParser
import requests
import re
import time

class urlHTMLParser(HTMLParser):
    links = []
    def handle_starttag(self, tag, attrs):
        if tag == 'a':
            dic = dict(attrs)
            if(dic.has_key('href') and dic.has_key('title')):
                if(re.search(r'najnovije-vesti', dic['href'], re.IGNORECASE) != None):
                    self.links.append({ 'href' : dic['href'], 'title' : dic['title'] })
                
            return
        else:
            return
    def handle_endtag(self, tag):
        return
    def handle_data(self, data):
        return
    def getLinks(self):
        return self.links

parser = urlHTMLParser()

client = UserClient(CONSUMER_KEY,
                    CONSUMER_SECRET,
                    ACCESS_TOKEN,
                    ACCESS_TOKEN_SECRET)

#response = client.api['users/show'].get(screen_name ='rdjordje')
#print response.data['description']
#print response.resource_url
#print response.headers


# politika request page - najnovije vesti 
# for proxy catch exception
try:
    req = requests.get('http://www.politika.rs/vesti/najnovije-vesti/index.1.sr.html')
except ProxyException as e:
    print e


if req.status_code == 200:
    print 'Status code: [OK]'
    print 'Parsing source page ...'
    parser.feed(req.text)
    parser.close()
    print 'Parsing complete, total latest news found: [', len(parser.getLinks()), ']'
    for l in parser.getLinks():
        try:

            print 'Sending tweet...'
            print 'Tweet : '
            tweet = l['title'] + ' - ' + l['href'] + ' #kratkinjuz'
            print tweet

            # sending tweet ...
            response = client.api['statuses/update'].post(status = tweet)

            print 'Tweet has been send.'
        except TwitterApiError as e:
            print e
        
        # sleep for 1 minute[s]
        print 'Waiting for 5 minute[s]...'
        time.sleep(60 * 5)
        


Ok, this is the full code doing something useful. (relatively) If you modify a code little bit, you can parse any relevant pages you want and post some content on twitter.

Nice thing would be to run crontab periodically instead of time.sleep.

Also nice idea would be to write a script that will check last activity for all your followers and take action in accordingly.  

Eratosthenes sieve - generating primes in a range





When we want to find all prime numbers up to some integer N algorithm called Eratosthenes sieve comes into play. Why? Well for one it's super fast. It runs in time O(n*log(log(n))) and O(n) space. But before I explain the algorithm and show some implementation in c/c++ I would like to discuss some simple observations in generating prime numbers.

By definition 2 is a prime.
Number k is a prime if k is not divisible by any number (but 1) up to k-1 or gcd(k, xi)=1 where xi is in range [2,k-1] .

So in order to check weather some number k is prime, we need to make sure that all numbers up to and including k-1 do not divide k.

case 1: -- (1)

Bool isPrime(k){ if(k==1) return false;
For(i=2; i < k; i++)
  If(k % i == 0)
    Return false;
Return true;
}



For example k = 11
i = 2:  11 mod 2 = 1 but 1 != 0
i = 3:  11 mod 3 = 2 but 2 != 0
.
.
.
i = 10: 11 mod 10 != 0, since this is last i in our loop, our function returns true. Which is correct. 11 is a prime number.

Let's try for some number which is not prime, let try for 4.
k = 4
i = 2: 4 mod 2 = 0, and function returns false, so for is not prime.
Let's optimize our function notice the following:

For k = 18, let'.s factor it.
18 = 2 * 9
18 = 3 * 6
18 = 6 * 3
18 = 9 * 2

Notice that after some point by commutation of product, pattern repeats. Let's check after what number as a function of k we can stop checking and be sure we've checked enough to claim no more checking is needed.

For k = 18, i has to go up to and including 3, which seems to be floor(sqrt(k)) - 1 but let's see what happens when k is square of prime number.

For k = 49
49 = 7*7, 7 is a prime, so iterator i has to be 7 in order to conclude that 49 is composite.

Now we see that in cases when k is perfect square our iterator i has to go up to and including sqrt(k).

So range we have to check is while i<=sqrt(k).

This is nice improvement for identifying whether some k is prime.
Our loop in conclusion becomes:

for(i=2; i <= sqrt(k); i++)              -- (2)
.
.
.

Of course we can optimize further, none of the even numbers can be prime, so we don't need to check them at all.
So our loop can start at 3 and go up to sqrt(k) in step of 2. Like this:

for(i=3; i <= sqrt(k); i+=2)               -- (3)

Our implementation of function for primality test can be optimized even further by the fact that any prime number can be of form (6k+1) k = -1,1,2,3,4 this optimization would optimize primality test by a factor of 3. In any case purpose of this text is to take another approach that is sieving primes up to some range, so now will look into idea of Eratosthens sieve.

However before we do that, we should note something about time complexity of stated functions.

Let's analyse time complexity:

Note that analysis of time complexity when dealing with prime numbers needs to take into account size of the input. Analysis we present is dependent of sole value of k or n, that is assumption is that modulo, division operations can be done in O(1) time which is not true for very large numbers. 

This should be taken with reserve. Since assumption in not true for when dealing with big integers! 

When checking if some number k is prime in first case we need to check from 2 to k-1 so time depends how large k is. And it depends linearly of k so time complexity is O(k) for single k.

In second case we check up to and including sqrt(k) so in big O notation time complexity is O(sqrt(k)).

What happens to time complexity if we want generate all primes up to some n?

Well let's see, for every positive integer k<=n we need to apply one of our functions. (Naive or improved)
So in first case with naive function, we have to call function isPrime n-2 times. 
Let's see the pattern here for example if n = 9:

isPrime(2) + isPrime(3) + ... + isPrime(8) + isPrime(9) generally we have [sum(n-k)] for k = 0 : n-2

[ (n + n-1 + n - 2 ... + n-n+2) ] asymptotic value of this sum is practically sum of first n natural numbers which can be evaluated to n(n+1)/ 2. so complete time complexity is of rang O(n^2). Which is far too slow.

if we apply discussed improved function which checks only sqrt(k) numbers, time complexity is better let's see how much: 

we have n-1 times to call improved function for some value k <= n. Our complexity is dependent of n and k in following way:

[ sum(sqrt(n-k)) ], k = 0 : n-2, so we have (n-1)[ (sqrt(n) + sqrt(n-1) + ... + sqrt(n-n+2) ]  

time complexity of this version is a little better, in order to estimate this we need to know how to expand this sequence of roots. For that purpose check out this link: math.stackexchange.com It would be something of a rang O(n^1.5) 

In case 3, time complexity is faster by factor ln(n). It becomes O(n^1.5/ln(n))

In order to improve time complexity we need another approach. Instead of for every k<=n checking whether it is prime and printing it, we will assume all of k up to n are prime and then discard all composite. That will leave us with all of primes up to n. This is central idea of Eratosthenes sieve algorithm.

Let's go back to definition:

2 is a prime. So by thinking, ok if 2 is prime then none of it's multiples are not prime.
So 4, 6, 8, 10, 12 ... are not so we discard them.

By same logic we come to next number 3 and discard it's multiples.
So 6, 9, 12, 15 ... Not prime so we discard them too.

Notice that as we go along every next k which is not discarded must be prime.


For example when we discard multiples of 2 and 3, next number which is not discarded is 5 because, 4 is discarded as a multiple of 2.

In this fashion we discard all multiples up to n and we're left with primes up to n.  

up to what k we have to sieve, well we know that for upper bound n we need to check sqrt(n) values and that is precisely how much we need to sieve to be sure, we've discarded all composite numbers up to n. 

The algorithm follows:

primes is array which will contain our primes. 
void primeSieve(int *primes, int n) {
bool *sieve = (bool*)malloc(n+1);

// assume 2 .. n are primes 
for(int i = 2; i <= n; i++)
  sieve[i] = true;

sieve[0] = false;
sieve[1] = false;

int count = 0;
for(i = 2; i <=sqrt(n); i++) {
   if(sieve[i] == false)
     continue;

// put primes in array we've passed as parameter of the function.
primes[count++] = i;

  for(int j = i*i; j <= n; j+=i)
     sieve[j] = false;
}

}


Running time of this algorithm is O(n*log(log(n))) but regarding input size it's complexity is exponential. 
Space complexity is O(n).

Finally when sieving primes, memory turns out to be very important factor for large n and so we have something called segmented sieve version. 

General idea is to sieve primes in segments as name suggest. 

[101  102 103 104 105 106 107 108 109 110] -- segment to be sieved. 

for each prime (precalculated) find first multiple such that lower end of the range is > then that multiple.

m = 101 , n = 110

for prime  = 2, 101 - 101 mod 2 = 100
now we discard all multiples of 2 starting from 100 up to 110 -- (100,110] 

we cross off 100, 102, .. ,110

for prime = 3, 101 - 101 mod 3  = 99

we cross off 102, 105, 108

and so on ... until we check all previous primes.

To put your skills at practice regarding segmented version of sieve, check out prime1 problem at spoj.

Hope that clears up some things for you, happy programming.   



Thursday, March 5, 2015

implementation of binary search tree in python

Simply put, a binary tree is a data structure in which each node has at most two children.

Binary search tree on the other hand has some specific properties which enable this data structure to be efficient executing operations like insert, delete, lookup ... 

Generally our binary search tree implementation satisfies following:

  • One node is designated the root of the tree.
  • Each internal node contains a key and has two subtrees.
  • The leaves (final nodes) of the tree contain no key. Leaves are commonly represented by a special leaf or nil symbol, a NULL pointer, etc.
  • Each subtree is itself a binary search tree.
  • The left subtree of a node contains only nodes with keys strictly less than the node's key.
  • The right subtree of a node contains only nodes with keys strictly greater than the node's key.

Our implementation is regular search tree. It doesn't guarantee efficiency of order O(log(n)) for worst case scenario like AVL trees (balanced trees).

Little about efficiency:


Insertion - worst case time complexity is of order O(n) -> when tree deforms to list.
Example would be if all elements in ascending or descending order.

| 1 2 3 4 12 525 535 ... OR 82 72 61 2 1 ...

However, in average case, time complexity of insertion would be of order O(log(n)) which is very good.

Same applies to delete operation and lookup operation.

Note: Actually for the sake of precision, search (lookup) operation is of order O(log(n)) -- in average case. Insertion in itself requires us to find a position for that element (search procedure) and the insert desired element, same for deletion. Inserting a node after we've found it's place is of order O(1) -- constant time. 


To implement this data structure in python we will define two classes, class Node will represent our nodes and information every node contains and other class Tree which will be class implementing our binary tree. We use classes since python doesn't provide structs like c.

Implementation is recursive as it is natural for binary search tree by definition.

Node class consists of attributes l, r, v -- which represents "pointers to" left and right child and v -- representing value or key our node contains.


class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val


  Now we define class Tree like so:

class Tree:
    def __init__(self):
        self.root = None

    def getRoot(self):
        return self.root

    def add(self, val):
        if(self.root == None):
            self.root = Node(val)
        else:
            self._add(val, self.root)

    def _add(self, val, node):
        if(val < node.v):
            if(node.l != None):
                self._add(val, node.l)
            else:
                node.l = Node(val)
        else:
            if(node.r != None):
                self._add(val, node.r)
            else:
                node.r = Node(val)

    def find(self, val):
        if(self.root != None):
            return self._find(val, self.root)
        else:
            return None

    def _find(self, val, node):
        if(val == node.v):
            return node
        elif(val < node.v and node.l != None):
            self._find(val, node.l)
        elif(val > node.v and node.r != None):
            self._find(val, node.r)

    def deleteTree(self):
        # garbage collector will do this for us. 
        self.root = None

    def printTree(self):
        if(self.root != None):
            self._printTree(self.root)

    def _printTree(self, node):
        if(node != None):
            self._printTree(node.l)
            print str(node.v) + ' '
            self._printTree(node.r)
class='hljs python'

method printTree is for testing, in order to test this code we make an instance of our tree class.
Here it is...


tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print (tree.find(3)).v
print tree.find(10)
tree.deleteTree()
tree.printTree()


Notice that following insertion order our tree should look like this:
     3
    / \
   0   4
    \   \
    2    8

In printTree function we've implemented lRr -- meaning left child, root, right child traversal. 
For practice you can add other tree traversals and implement delete specific node function.

So the full code ready for testing:


#!/usr/bin/python

class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val

class Tree:
    def __init__(self):
        self.root = None

    def getRoot(self):
        return self.root

    def add(self, val):
        if(self.root == None):
            self.root = Node(val)
        else:
            self._add(val, self.root)

    def _add(self, val, node):
        if(val < node.v):
            if(node.l != None):
                self._add(val, node.l)
            else:
                node.l = Node(val)
        else:
            if(node.r != None):
                self._add(val, node.r)
            else:
                node.r = Node(val)

    def find(self, val):
        if(self.root != None):
            return self._find(val, self.root)
        else:
            return None

    def _find(self, val, node):
        if(val == node.v):
            return node
        elif(val < node.v and node.l != None):
            self._find(val, node.l)
        elif(val > node.v and node.r != None):
            self._find(val, node.r)

    def deleteTree(self):
        # garbage collector will do this for us. 
        self.root = None

    def printTree(self):
        if(self.root != None):
            self._printTree(self.root)

    def _printTree(self, node):
        if(node != None):
            self._printTree(node.l)
            print str(node.v) + ' '
            self._printTree(node.r)


tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print (tree.find(3)).v
print tree.find(10)
tree.deleteTree()
tree.printTree()


Hope that helps if you had some problems implementing binary search tree in python!

Saturday, September 6, 2014

Geojson adjusted data-sets for Serbia and countries in the region with d3.js map examples.

 



#geojson  adjusted data-sets 
for  #serbia   #croatia  #slovenia   #montenegro   #bosniaandherzegovina   #macedonia  with d3.js map examples. (data adjusted from natural-earth).

Motivation for this mini project was because there is a lack of official shape files (shp) data-sets and the need to do some visualizations in d3.js lib. Final results with complete map examples can be found at serbia geojson - my site on faculty servers. This adjusted data-sets are useful if you want to do some kind of visualization with d3.js lib so feel free to use full map examples, tough a backlink to this site or my faculty site is appreciated.

As for the linux part of this post, tool used to convert shape files to geojson is ogr2ogr from gdal project.  
Also if you're looking to convert geojson to topojson, tool you're looking for is topojson lib from node.js [ sudo npm install topojson ] assuming you have npm (node packet manager) and node.js server. 

Once again full d3.js examples and adjusted data-sets for Serbia and countries in the region can be found here.

Note: There is a fun dataset for former Yugoslavia for Yugo nostalgic people. :) 

see you soon. 
Djordje

Thursday, August 28, 2014

soft lockup - cpu#0 stuck for n s

CPU soft lockup [ubuntu] - microcode 



If you're experiencing problems with soft lockup of you CPU this may be due to old BIOS firmware and incompatible CPU instruction set with your motherboard firmware. Motherboard manufacturers are reluctant to issue BIOS updates unless there is a major issue, so your CPU microcode can be quite dated. To fix this issue on Ubnutu (usually CPU lockups) there are intel and amd based microcode packages which are loaded without permanent BIOS changes on boot time.

For users owning intel processors you can install microcode like so:

sudo apt-get install intel-microcode microcode.ctl


For amd users:

sudo apt-get install amd64-microcode  


to check whether microcode has been loaded:

dmesg | grep microcode


for intel processors output should look something like this:



[    0.000000] CPU0 microcode updated early to revision 0x29, date = 2013-06-12
[    0.089933] CPU1 microcode updated early to revision 0x29, date = 2013-06-12
[    1.263975] microcode: CPU0 sig=0x206a7, pf=0x10, revision=0x29
[    1.263983] microcode: CPU1 sig=0x206a7, pf=0x10, revision=0x29
[    1.263991] microcode: CPU2 sig=0x206a7, pf=0x10, revision=0x29
[    1.263999] microcode: CPU3 sig=0x206a7, pf=0x10, revision=0x29
[    1.264074] microcode: Microcode Update Driver: v2.00 
<tigran@aivazian.fsnet.co.uk>, Peter Oruba


  


Note: Don't apply this fix unless you're experiencing CPU lockup problems. (If this does not correct your problem check out your CPU stability, check power supply as well.




Tuesday, June 24, 2014

Enable compression and leverage browser caching of your site

Have you done page speed test of your web pages lately? There is a nice tool developed by google for other developers and webmasters which can help gain information whether there is space for speed optimizations of your webpages. Tool is called PageSpeed Insights and you can gather valuable information doing this test. 
  



Information you gather from analysis will point you in right direction what can be optimized on you pages. These factors probably play some role in overall SERP (ranking of your page on google) to what degree nobody know but if there is even little edge in SEO, and it's quick to optimize, why not do it? There is nothing to lose and a lot of to gain in terms of page speed and consequently in user experince which a valued thing in eyes of google. (Why else show estimation of user experince of your page analysis?)

Here we will do quick speed optimization of browser caching and gzip/deflate compression.

These two play big role in overall speed estimation as you will see by testing after you've optimized your pages.

Leverage browser caching 

Idea behind browser caching is to assign expiration date of concrete resources (commonly grouped by file type) on web server and when user sends request to server asking for a specific resource browser caches that request and when user sends another request to server, if resource expiration date has not yet expired browser will try to load that resource from local browser cache instead of requesting transmission from server. This in turn reduces need for communication between client-server and loads pages much faster. Also caching reduces server bandwidth which is important for high traffic sites.

In order to leverage browser caching you should edit your .htaccess file in root directory of your site.
Of course we assume your hosting server supports this. If there is no .htaccess files create it and enter the following:


ExpiresActive on
ExpiresDefault A0

# 1 YEAR - doesn't change often
<FilesMatch "\.(flv|ico|pdf|avi|mov|ppt|doc|mp3|wmv|wav)$">
ExpiresDefault A29030400
</FilesMatch>

# 1 WEEK - possible to be changed, unlikely
<FilesMatch "\.(jpg|jpeg|png|gif|swf|js|css)$">
ExpiresDefault A604800
</FilesMatch>

# 3 HOUR - core content, changes quickly
<FilesMatch "\.(txt|xml)$">
ExpiresDefault A10800
</FilesMatch>

Numbers are expiration time in seconds. If you want you can change this to suit your needs.

Now that you've saved .htaccess file on your server, if supported, your server will assign to requests expiration tag of resources representing file types we've added.

And that's it "leverage browser caching" done! Now we step into gzip/deflate compression.

Enable gzip/deflate compression on your site

gzip/deflate compression is a method of compressing requests requested resources from web server. What happens is that before sending resources to client web server compresses web page in gzip or deflate compression and then sends it. This in turn "lightens" the page (size is smaller) and again helps client in terms of page load. Of course client or commonly browser decompresses that page and shows it to the user in original way. So it's a kind of transformation. 

To do this and enable compression of your web site content again open .htaccess file and add before caching options following (we'll be using deflate compression):



AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript 


Save .htaccess file and you're ready to test your page speed, check out page speed insights and let me know of your results in comments!
 
      


Enable userdir apache module on Ubunty

What is userdir? 

userdir is module on apache2 web server that enables users (apart from root user) on multiuser system like linux, accessible public directory on web server. For every user on the system there is corresponding address. To be more precise if there are users, john, peter on system corresponding addresses would be: 

http://hostname/~john/    --> mapped to /home/john/public_html/
http://hostname/~peter/   --> mapped to /home/peter/public_html/

So every user on the system can have their own web site.

Why do I need apache2 userdir mod?

If you have multiple web projects you are working on and don't want to bother with root account all the time userdir is the way to go. 

Ok so how do I enable this userdir mod? 

If you have apache2 web server installed, then userdir module is available, you just have to enable it. To do this, open your terminal (CTRL + ALT + T) and enter following command: 

sudo  a2enmod userdir

This command will enable userdir module, now you need to create public_html directory under your 'home' path. Your home path will be /home/yourusername/ so that's where you create new directory
like so: 

mkdir public_html

and add privileges 

chmod 755 public_html

After this step you need to restart your apache2 server so the changes are applied. 

sudo service apache2 restart

Ok now when you create something in public_html dir you can access it via web browser. So create file:

index.html and fill it with some text. It will be accessible via http from web browser:

http://localhost/~youusername/index.html

That's nice but php is not interpreting my code, what to do? 

By default php interpreter is disabled from userdirs so you'll have to enable that as well. To do so you have to edit php5.conf file:

sudo gedit /etc/apache2/mods-enabled/php5.conf

and you'll see something like this: 


<FilesMatch ".+\.ph(p[345]?|t|tml)$">

    SetHandler application/x-httpd-php

</FilesMatch>

<FilesMatch ".+\.phps$">

    SetHandler application/x-httpd-php-source

    # Deny access to raw php sources by default

    # To re-enable it's recommended to enable access to the files

    # only in specific virtual host or directory

    Order Deny,Allow

    Deny from all

</FilesMatch>

# Deny access to files without filename (e.g. '.php')

<FilesMatch "^\.ph(p[345]?|t|tml|ps)$">

    Order Deny,Allow

    Deny from all

</FilesMatch>



# Running PHP scripts in user directories is disabled by default

#

# To re-enable PHP in user directories comment the following lines

# (from <IfModule ...> to </IfModule>.) Do NOT set it to On as it

# prevents .htaccess files from disabling it.

<IfModule mod_userdir.c>

    <Directory /home/*/public_html>

        php_admin_flag engine Off

    </Directory>

</IfModule>


Comment last 5 lines (from <IfModule> .. </IfModule>)  and you should get:


<FilesMatch ".+\.ph(p[345]?|t|tml)$">

    SetHandler application/x-httpd-php

</FilesMatch>

<FilesMatch ".+\.phps$">

    SetHandler application/x-httpd-php-source

    # Deny access to raw php sources by default

    # To re-enable it's recommended to enable access to the files

    # only in specific virtual host or directory

    Order Deny,Allow

    Deny from all

</FilesMatch>

# Deny access to files without filename (e.g. '.php')

<FilesMatch "^\.ph(p[345]?|t|tml|ps)$">

    Order Deny,Allow

    Deny from all

</FilesMatch>



# Running PHP scripts in user directories is disabled by default

#

# To re-enable PHP in user directories comment the following lines

# (from <IfModule ...> to </IfModule>.) Do NOT set it to On as it

# prevents .htaccess files from disabling it.

#<IfModule mod_userdir.c>

#    <Directory /home/*/public_html>

#        php_admin_flag engine Off

#    </Directory>

#</IfModule>


Now save the changes (CTRL +S)  and restart apache2 web server again.

sudo service apache2 restart

Now your php files under userdir will be interpreted by php interpreter.  More information on apache2 userdir can be found http://httpd.apache.org/docs/2.2/mod/mod_userdir.html.