Tuesday, November 10, 2009

Branch and Bound etc.

The three patterns in this set are Branch and Bound, Graphical Models and Structured Grids. First, we can talk about Branch and Bound. Amongst the four operations of branching, evaluation, bounding and pruning, I feel that the most important and valuable one would be the application of the objective function to a point in the search space. Evaluation becomes harder for larger search spaces. Pruning a particular branch using this strategy becomes pretty simple as it is quite easy to determine when the minimum value of a linear integer programming problem on a branch is greater than or equal to the current minimum possible solution. I feel that this kind of optimization is also very similar to the Monte Carlo pattern that we studied earlier in the class. The flowchart given in the description demonstrates this pattern very well. The second pattern is regarding Graphical models. Graphical Models is the next pattern in this set. This doesn't have much of a description though. In addition to the examples mentioned, I would like to add Bayesian inference as one of the possible algorithms using this pattern. Code snippets would have been helpful in understanding this pattern for people not familiar to the mechanisms described in the paper. The last pattern is about Structured Grids. We can observe that this is one among many patterns which are applicable to problems in image processing and in computer vision. It is mentioned in the context that the problems which follow this pattern perform updates until a particular error threshold is reached. It seems to me as this is closely linked to the granularity of the tasks to be performed. Overall, these set of patterns were more closely linked to problems that I could relate to in the real world , and were a very interesting read.

Shared Queue, Speculation and Digital Circuits

The first thing that I would like to talk about is shared queue. The first aspect is the context in which this could occur. I have come across many graph mining problems which run serially on single units of execution. Parallelizing this process using the shared queue could be an interesting research area to look into. I completely agree with the notion that more complicated constructs could increase the error-rate of the code structure. In the solution, the most important point would be about defining the shared ADT since the rest of the process would depend on it. Again, as Prof.Johnson mentioned in today's class, this paper has a lot of code, and seems to be better written than some of the other patterns that we read earlier. It will be interesting to read more about the choice of nested locks here. The second paper is about speculation. This seemed to me to be the most interesting of the three patterns. The commit/recovery mechanism depending upon the status of a predicate seems to be the most complicated of the tasks mentioned in the paper. I would love to see more elaboration on the recovery mechanism especially on the description of more intelligent recovery mechanism mentioned in the paper. I searched for more information and examples regarding this pattern but I couldnt find that many.

The last paper is about digital circuits pattern. This paper seemed the one requiring the most introspection among the three. Among the examples given, I could relate most to the databases pattern because I have been working on this but not on the scale mentioned in the paper. For most bit level operations that I have encountered in C, I have had to intelligently use up the remaining bits in a word so as to not waste memory. Could we say that bit level parallelism is an 'embarrassingly parallel' problem? Among the examples presented in the paper I found the description of finding the nth gray code to be very interesting. The fitting of data completely as a word is the key aspect here. Overall, I think the three patterns in this set are some of the most introspective that I have read this entire semester.

Thursday, November 5, 2009

chp 6

This chapter discusses building applications, specifically about the implementation of supervisors in OTP. Amongst the five behaviors mentioned in the chapter, I think the supervisor behavior is the most important of all five. The hierarchical manner in which the system is built makes it easier to develop new releases. The supervision of worker nodes by the supervisor holds some aprallels with other parallel programming languages that I have worked on. The generic_server API and the server example presented later in the chapter make it easier for us to understand the initail process especially about how exactly the arguments and function calls are used to implement the gen_server. The Event manager principles are interesting and I especially liked the description of how the event manager is like a generalized finite state machine. The application developed to illustrate the finite state machine API is useful to help understand the uses of the packet assembler. The application APi has been described in a clear and concise fashion, and the example serves to help us understand the primitive behaviors. From the discussion it is clear that systems built using OTP behaviors have a systematic structure. Overall this chapter was a very interesting read, and gave insights into how such systems can be put into practice.

Part 1 - Pipelining and parallelism

This post is about data parallelism, pipelining and geometric decomposition. First, about pipelining. One of the interesting aspects is regarding the throughput of the data processed by the pipeline. An analogy with a real world example is from my personal experience working on video processing. To render a 3D video from multiple stereooptic cameras, the required video is obtained smoothly only if the video streams at each camera are combined together to obtain the specific frame rate. Although this is not an strictly parallel programming effort, I can find some similarities here. Another interesting aspect of this paper is the point made about the slowest computation creating a bottleneck in the entire concurrent pipelining. Plugging-in of objects or frameworks to the pipeline is another interesting aspect mentioned in the paper. The second part is about data parallelism. I came across this article which talks about data parallelism. Sometimes this method of processing data utilizes the capacity of a multi-core to the maximum extent. At the end of the post, they talk about a real life implementation of this in an advanced German nuclear fusion platform, the ASDEX tokamak. The last part is about the Geometric Decomposition Pattern. I feel that one of the most important elements of the solution is the data decomposition part especially about the granularity of the decomposition. I feel that another important aspect of this is the need to balance decomposition vs code reuse especially in larger systems. Though I have actually worked with this kind of pattern I would love to hear more instance of this, through other real-world examples.

Thursday, October 22, 2009

Chp 5

To add to the definitions of a fault tolerant system(as mentioned in the introduction of this chapter), I was reminded of a research paper I came across a few days back which talked about all software faults resulting from design errors, and the changing paradigm presented in this paper. I like the intuitive way the hierarchy of tasks has been organized and described about, and in many ways it does parallel most computing problems (or even problems we come across in our daily life). The distinction between errors, exceptions etc has been described very well, and I am somehow reminded of the programming model of Java (which I am hugely familiar with) which is different from that of Erlang. The division into supervior and worker processes somehow seems similar to concepts in distributed computing where the goal is divide a task among n processors. Overall, this chapter has been written in a simple and concise manner, and the among other things, the description of well behaved functions stands out in this work.

Summary of Part 1

This post talks gives a review about the dense linear algebra pattern, the monte carlo and the graph algorithms patterns. The first thing to talk about is the dense linear algebra pattern. The problem of maximizing performance while balancing data movement is an interesting one, and I am sure it is similar to the problems encountered in many different areas of Computer Science. Being a student with research interest in Data Mining, I come across many of these problems frequently. One of the parallels drawn in my research is to minimize the generation of candidate sets to problems (or to completely avoid them), and some of the best algorithms are the ones which are able to utilize the computing power, deliver maximum maximum performance without delivering too much overhead to the system. The question of an algorithm being implementation (or problem-dependent) is also an interesting aspect, and as such many algorithms must be able to generalize over a wider range of datasets. The paper is interesting as it gives an overview of how problems can be modeled as standard equations(or problems) in linear algebra and can solved through the use of libraries.

The second paper talks about the graph algorithms pattern. I can relate to this because I encounter this frequently when solving graph-mining problems. The problem of modeling a data mining problem into a graph-mining problem utilizing graph algorithms is interesting and one of the useful aspects(mentioned in the writeup) is determining theoretical bounds on the performance of different algorithms. For example, one interesting problem which I thought of (on reading the various examples mentioned) is determining dense clusters of friends from a social network graph. I also feel that ignoring the special properties of the graphs(in the context of the problem that we are studying) does not help us develop efficient algorithms (which take advantage of those special properties). The writeup gives a nice overview of recognizing different types of graph structures, (and I think I will refer back to this page when I want to refresh concepts in graph algorithms :)). The last paper is about Monte Carlo patterns. I first came across these patterns as an undergraduate student when I was reading papers in simulated annealing and in particle swarm optimization. I feel that Monte Carlo methods are very useful is solving problems where no known optimal solution exists.

Tuesday, October 20, 2009

Review of Map-Reduce

This writeup gives an overview of the Map-Reduce pattern and its various uses. It gives a brief overview of the two phases involved in Map-Reduce - The Computation and the Reduction phase.
The issues occurring while distributing the smaller computations to the worker nodes as well as the inherent problems involved in reduction are explained clearly. The MapReduce concept is easy to understand from the logical viewpoint of pairs, where the input is taken in the form of data in a domain and the output is in a different domain. Other concepts of MapReduce such as the distribution of tasks and reliability of results are equally important, and even though they are not mentioned in this writeup, they are worth mentioning. Of great importance is seeing the MapReduce implementation of the PageRank algorithm. Earlier, during the initial stages of MapReduce, there was a certain amount of backlash from the database community because it promoted concepts like Brute Force instead of indexing and does not rely on many of the DBMS tools that people have used over the years.