Python wins
Speed and performance :
Java wins
This issue arose just recently at work. We needed a simple command line tool to read from standard input, do some simple logic and write output to a specified file. Developing this in Python was a breeze. Perhaps all of 20 lines of code:
import os, sys;
def hashCode(string):
''' Exact copy of Java's String.hashCode() method '''
h = 0;
for ii in range(len(string)) :
h = 31 * h + ord(string[ii])
return h;
import os, sys;
def hashCode(string):
''' Exact copy of Java's String.hashCode() method '''
h = 0;
for ii in range(len(string)) :
h = 31 * h + ord(string[ii])
return h;
try :
files = {}
files[0] = open('my_output_file_A.txt', 'w')
files[1] = open('my_output_file_B.txt', 'w')
cnt = 0;
for l in sys.stdin: ## read from stdin
key = l.split( '\t' )[0] ## Split on TAB and get first column
ext = hashCode(key.strip()) % 2 ## hash key; 0 = file A. 1 = file B
files[ext].write(l)
cnt+=1;
print 'Total lines :', cnt;
except : KeyboardInterrupt
Python performance on 10 Million lines of input : 58 sec.
Not bad really. I was happy. Until I happened to be doing something very similar with only basic unix commands (i.e. AWK).
Time for similar adjustments on same 10 Million lines : ~10 secs.
What's up with that? Perhaps it was the hashCode function? Profiling showed 1/2 the time was spent on I/O and only 1/4 on the function. So even with zero I/O and no hashCode method we're still slower than a comparable command line script?
Hmm, let me see how this would perform in a compiled language instead of an interpreted one. Porting this code over to Java took a bit longer than the Python counter part. Long story short:
BufferedReader br = new BufferedReader( new InputStreamReader(System.in));while( (line=br.readLine()) != null ) {cnt++;idx = line.indexOf( '\t' );mod = line.substring(0, idx).hashCode() % 2;if (mod < 0 ) mod = mod + 2 /* Python/corrected modulus */files[mod].write(line + '\n'); /* BufferedWriter[2] Code ommitted for space */}br.close();
No comments:
Post a Comment