Using Big Data technologies to perform Genomics Alignment
Results probe that, using BigBWA in a Hadoop cluster in AWS with 32 cores, the user can have a speed up of 26.7x. On the other hand, using SparkBWA in a YARN/Spark cluster with 128 executors, the obtained speed-up is 85.6x.
High Performance Computing for Natural Language Processing
In this project we used Big Data technologies (Hadoop) and cloud computing to carry out Natural Language processing (NLP) tasks. My main tasks are:
- Development of Java, Python and Perl programs within the MapReduce programming model.
- Performance analysis of different Natural Language Processing tools for its use with Big Data technologies
From this project we obtain the software Perldoop. This software allow us to translate Perl code into Java to run it into a Hadoop cluster, but with some restrictions. You can find more information and help in the project page.
GANESO (Gas Networks Simulator and Optimizator)
GANESO is a software to simulate and optimize the gas transport in gas pipelines by solving numeric equations asociated to this kind of problems.
This software is developed by using Fortran language and Pyhton for the graphic interface. In this graphic interface, the open source Quantum Gis has been used, because it allows to add Pyhton plugins.
My main tasks in this project are:
- Development of the graphic interface using Python
- Development of some program modules using Fortran
- Implementation of the program in cloud as SaaS (Software as a Service)
Improving locality and affinity of the sparse matrix-vector product by using Morton representation
The optimization of the sparse matrix-vector product is nowadays a challenge, specially in multicore systems, because of its irregular accesses to memory. There are several studies in which various representations and techniques are analyzed. One of such representations is the Morton representation, that prooved to have advantages in situations where locality and affinity are main aspects.
In this project, results between classic representations and Morton are compared in terms of execution time and cache access. The implemented program is developed in C and using the PAPI library to measure counters.
Recognition of people from different points of view within a multiple cameras environment
Final project for my engineering degree in Computer Science This project belongs to a bigger project named “Multi-agent system for fast development of a guide robot in unknown environments” carried out by Adrián Canedo, Víctor Álvarez and Xosé Manuel Pardo López, members of the CiTIUS. My part in this project consists in know how cameras are located within an indoor environment. For doing this I used people recognition features from OpenCV libraries, an agent detects people in the camera images and, by detecting the same people in another cameras, I can know if these cameras are neighbors or not.
With this information about cameras being neighbors or not, robots from the other project can be guide due to situations detection within this indoor environment by knowing the cameras positions.
CPM (Cluster Processes Monitor)
CPM, or Cluster Processes Monitor, is a tool to measure the performance of job processes inside a supercomputing cluster.
Typically, in a computing cluster, the task of monitorize a parallel job and get statictics, such as CPU consumption, used memory or other kind of data, is a tedious job that implies visualize large logs or modify the application source code. There are some solutions that monitorize all the cluster, such as Nagios or Zabbix, but they monitorize the nodes total resources, and not per process of a launched job. CPM allows the user to monitorize data from individual processes in real time and get plots and statictics from them in a very easy way. It is composed by three programs that communicate among each other and work togheter across the network.
Matrix Market Suite
- Create dense symmetric matrices.
- Create dense vectors.
- Dense matrix-vector multiplication.
- Solve linear systems using the Conjugate Gradient method.
- Parallelization of the Conjugate Gradient method and the dense matrix-vector multiplication by using MPI.
To do functionalities:
- Parallelization of the Conjugate Gradient method and the dense matrix-vector multiplication by using CUDA.
- Implementation of new linear systems solvers (Jacobi, etc,…).
- Implementation of operations for sparse matrices.