Chandra Ramachandran: October 2009

Thursday, October 22, 2009

Chp 5

To add to the definitions of a fault tolerant system(as mentioned in the introduction of this chapter), I was reminded of a research paper I came across a few days back which talked about all software faults resulting from design errors, and the changing paradigm presented in this paper. I like the intuitive way the hierarchy of tasks has been organized and described about, and in many ways it does parallel most computing problems (or even problems we come across in our daily life). The distinction between errors, exceptions etc has been described very well, and I am somehow reminded of the programming model of Java (which I am hugely familiar with) which is different from that of Erlang. The division into supervior and worker processes somehow seems similar to concepts in distributed computing where the goal is divide a task among n processors. Overall, this chapter has been written in a simple and concise manner, and the among other things, the description of well behaved functions stands out in this work.

Summary of Part 1

This post talks gives a review about the dense linear algebra pattern, the monte carlo and the graph algorithms patterns. The first thing to talk about is the dense linear algebra pattern. The problem of maximizing performance while balancing data movement is an interesting one, and I am sure it is similar to the problems encountered in many different areas of Computer Science. Being a student with research interest in Data Mining, I come across many of these problems frequently. One of the parallels drawn in my research is to minimize the generation of candidate sets to problems (or to completely avoid them), and some of the best algorithms are the ones which are able to utilize the computing power, deliver maximum maximum performance without delivering too much overhead to the system. The question of an algorithm being implementation (or problem-dependent) is also an interesting aspect, and as such many algorithms must be able to generalize over a wider range of datasets. The paper is interesting as it gives an overview of how problems can be modeled as standard equations(or problems) in linear algebra and can solved through the use of libraries.

The second paper talks about the graph algorithms pattern. I can relate to this because I encounter this frequently when solving graph-mining problems. The problem of modeling a data mining problem into a graph-mining problem utilizing graph algorithms is interesting and one of the useful aspects(mentioned in the writeup) is determining theoretical bounds on the performance of different algorithms. For example, one interesting problem which I thought of (on reading the various examples mentioned) is determining dense clusters of friends from a social network graph. I also feel that ignoring the special properties of the graphs(in the context of the problem that we are studying) does not help us develop efficient algorithms (which take advantage of those special properties). The writeup gives a nice overview of recognizing different types of graph structures, (and I think I will refer back to this page when I want to refresh concepts in graph algorithms :)). The last paper is about Monte Carlo patterns. I first came across these patterns as an undergraduate student when I was reading papers in simulated annealing and in particle swarm optimization. I feel that Monte Carlo methods are very useful is solving problems where no known optimal solution exists.

Tuesday, October 20, 2009

Review of Map-Reduce

This writeup gives an overview of the Map-Reduce pattern and its various uses. It gives a brief overview of the two phases involved in Map-Reduce - The Computation and the Reduction phase.
The issues occurring while distributing the smaller computations to the worker nodes as well as the inherent problems involved in reduction are explained clearly. The MapReduce concept is easy to understand from the logical viewpoint of pairs, where the input is taken in the form of data in a domain and the output is in a different domain. Other concepts of MapReduce such as the distribution of tasks and reliability of results are equally important, and even though they are not mentioned in this writeup, they are worth mentioning. Of great importance is seeing the MapReduce implementation of the PageRank algorithm. Earlier, during the initial stages of MapReduce, there was a certain amount of backlash from the database community because it promoted concepts like Brute Force instead of indexing and does not rely on many of the DBMS tools that people have used over the years.

Thursday, October 15, 2009

Review of Chess

This paper presents a tool for finding Heisenbugs in concurrent programs. A systematic exploration of program behavior is one of the important features of Chess enabling it to find bugs in a short time. One of the first things that strikes me about this paper is the fact that the authors have described two real bugs that have been found using Chess, and also how they were fixed by the tool. Another smart aspect in the tool is the control over thread execution. Even though the choice of algorithms for the search phase is a challenging problem, I think Chess handles this phase well.

Choosing to replay a given concurrent execution in an appropriate manner according to the tool is a goal which has been described very well in this paper. By providing some kind of a
'restart' through its cleanup functions, Chess ensures that every run of the program is handled well. There is also a subtle focus towards leaving some choices to the user especially in terms of delivering deterministic inputs, and is one of the interesting aspects of this work.

Tuesday, October 13, 2009

BA Chapter 14

This chapter gives a rather nostalgic overview of the basic architectural ideas behind the design of Smalltalk. The question of inheritance and its good/bad uses is addressed initially, and its support comes from the description of Smalltalk itself. Earlier, I came across a post describing how Smalltalk might make its comeback. The author in that post says that Smalltalk was ahead of its time, and now that OO concepts are firmly implanted in the heads of every beginning programmer, Smalltalk might be an option to consider. Many of the benefits espoused by languages such as Python, Ruby etc have been around earlier in Smalltalk and I think many people (who are unfamiliar) overloook this aspect. People who like concepts such as metaprogramming, dynamic typing etc will find that Smalltalk is similar to their needs. The regrowth in popularity of Smalltalk might be just a temporary fad as much of the earlier "problems" have been solved by languages such as Java with the growth and advancement of hardware and by software which could deliver "executables". With the recent growth in web applications, I think Smalltalk frameworks such as Seaside should become popular as well.

Wednesday, October 7, 2009

Review of ReLooper

This paper describes the ways in which an eclipse based tool called as ReLooper helps programmers help parallelize their programs. It makes use of the ParallelArray data structure for arrays. This paper gives an emphasis on determining when parallelizing a program would be unsafe (thread-unsafe). A good deal of user interactivity is, I think, one of the important features of ReLooper. The study evaluation for determining whether the refactored programs have conflicting memory access is useful. From the research evaluation it seems clear that reporting all possible race conditions (or not having false negatives) is one among the several benefits of this work. There also seems to be a good deal of speedup achieved (for both 1 and 2-core) as compared to the original code. The methodology of evaluation, by no means comprehensive, look sufficient to show the inherent advantages of using ReLooper. I tend to disagree with some of the evaluation results on the time taken (especially for the machine learning algorithms). It is hard to compare in the absence of other techniques (for comparison purposes). However with all the inherent advantages of ReLooper, I do not see a lot of practical applications for this as many real-life programs are much more complicated, and it remains to be seen how would research progress from this stage to the point where many different types of code could be refactored in a similar manner.

Tuesday, October 6, 2009

Software Architecture: OO Vs Func.

This chapter provides an interesting study of the benefits/disadvantages of functional programming vs object oriented design. Before delving into the details presented in this paper, I would like to talk about a similar article that I came across a few days back. This article provided motivation to take up programming in Haskell for a programmer who is used to programming in OO languages like C++, Java, C# and so on. The author makes the case for Haskell by saying that functional programming provides immutable objects, higher order functions, inclusional polymorphism and the fact that functional programmers typically are concerned with the question about how is data constructed rather what to do with the data (which is the case for OO programmers). Another article on similar lines describes the case for Haskell programming language.

One of the things that I would like to point out on reading the chapter is that the same metrics cannot be used for comparing the two different programming styles. The argument of considering better modularity of code(as a criteria for comparing programming styles) is valid especially while developing large software systems. Sticking together of these functions by "glues" is an interesting aspect, and even though the specifics are clearly mentioned, the details should be a little more subjective.

Refactoring Sequential Java Code for Concurrency via Concurrent Libraries

This paper presents a way for restructuring sequential code into parallel code using concurrent utilities. It makes use of the java.util.concurrent framework in Java5 and the ForkJoinTask Framework in Java7. I liked the argument that programming with locks in error-prone. Though I believe this is a contentious topic, I tend to agree with the authors' comments on this. The frameworks mentioned in the paper have addressed the research issues of usefulness, of making the existing code thread-safe, and finally of efficiency. The evaluation methodology used by concurrencer is pretty much comprehensive, and I doubt that many questions could be raised about this.

In the implementation, I liked the concept of ConcurrentHashMap , which avoids locking of the entire map.The number of ways in which refactoring is supported is pretty impressive and allows for parallelizing of different kinds of sequential code patterns. Another important aspect of this paper is that they have accounted for human intervention while retrofitting parallelism into sequential code. In this way, I believe they have accounted for future changes in code (especially if drastic changes have to be made in the code). It is also interesting to note that among the three conversions studied in this paper, only ConcurrentHashMap requires a degree of human intervention.

When the bazaar sets out to build cathedrals

In this chapter, the authors describe the inter-relationships between ThreadWeaver/Akonadi with the KDE project, and the development process in large scale open-source projects. One of the first things that struck me before the authors described these two projects in KDE was the way in which the focus is on developing and maintaining quality code within open-source projects. The authors have hit the nail in the coffin when they allude to the fact that open-source code is mostly of a higher quality than proprietary code. This fact can also be seen from the huge success of community open-source development programs like the Google Summer of Code. The motivation of just "reaching the finish line" rather than money or fame seems to work really well.

The Akonadi project by itself has several benefits, and the one of the interesting things that I noticed in its design was the focus on maintaining the stability of the overall system by way of providing separate processes for components requiring access to a certain kind of storage backend. Linking together of components with third-party libraries without compromising the stability of the overall system is also one aspect which seemed advantageous in the initial design of the Akonadi architecture. The authors have mentioned a series of code optimizations which seemed potentially useful, and even though I have not done much research into whether these have been implemented, I am interested in knowing the outcome of this. Does anyone know more about these optimizations?

Chandra Ramachandran