Data Tsunami, Case Study Example

How to Keep the Tsunami from Overcoming United States

Previously in the year 2011 on 14 May Green Bank West Virginia, the members present at the innovations Data-intensive Astronomy workshop realize that in order to manage huge data sets, a community is needed along with partnership with national cyber-infrastructure programs. Moreover, the archives need to be working on lower budgets. This can be done by providing better solutions, investigations and innovations related to new technologies. In this research, the issues related to the archives and their uses are addressed.

Computing Infrastructure

The national cyber infrastructure along with astronomy is engaged together for innovations and ideas. The advancement of technology helps a larger amount of infrastructure, workflow performance and processing data to optimize several tasks. In addition, the Montage image mosaic engine20 is implemented by IT community in order to develop infrastructure techniques such as, task schedulers etc. However, these efforts are not officially organized and thus needs to be developed.

Changes within Cultures

The advanced IT knowledge is not broadcast properly at this stage in the astronomical community. The information related to the IT advancements is scattered in files other proceedings. Therefore, in order to resolve this problem an interactive online paper is created that contains all the information regarding technology in astronomy and physical sciences. Moreover, it is important to change the system in astronomy for the acknowledgement of computer work. This helps to maintain qualified personals in this industry. Lastly, the information science community must be developed that contains science driven infrastructure. The good example of collaboration is the SciDB database that optimizes the science applications.

Changes in Education

The archive model involving data processing for end users and maintaining software are developed in this environment. The self-teaching method also helps in this regard. In order to make software engineering compulsory for graduate education, a demonstration is provided to the students high lighting the advantages of this program.

The software designs and maintenance such as, version control, documentation, basics of design for appropriate testing are included in the software engineering syllabus. The syllabus also includes:

  • Scripting language
  • Portable code generating
  • Parallel-processing techniques
  • Principles of databases
  • Usage of high-performance platforms for example clouds, cluster and grids
  • Computer works on low-level languages

The high-performance computing techniques and load on server must be taught by experience teachers.  This type of syllabus helps the astronomers to develop their personal scalable codes. In addition, it also helps the computer scientist to work on next-generation applications. The existing teaching method is replaced by new ones. This can be done by offering software engineering students an online class’s program that encourages contributors form community. The software Carpentry16 provides an open source projects in this regard.

In Louisiana State University, Frank Loffler explains graduate class high-performance computing techniques that can be used daily by the users. The students are offered hands-on experience at the running simulation codes on the TeraGrid. Moreover, this experience also includes programming for mode black holes, prediction of hurricane effects and optimal production for oil and gas from natural reserves.

Modernization in Serving and Learning Data

The astronomy domain requires new data discovery methodology and techniques that counter the fact associated with the anticipated development in data size and assist intelligent discovery of massive data sets within the distributed data that is archived (Jaschek, 1989). Likewise, these methodologies and techniques must precisely focus on providing opportunities of discovering data and access that data within the petabyte size data sets. For instance, searching images on different wavelengths on a large area of the sky called as Galactic Plane and at the same time minimizing excessive load on servers. The VAO (Virtual Astronomical Observatory) that is a part of the efforts taken worldwide for providing services of seamless international astronomical data discovery is discovering these techniques. Likewise, the development focuses on an indexing service based on R tree, as it is able to assist robust, expandable and huge databases of astronomical data sets associated with imaging (Nautical & Us Nautical, 2012). These R trees are defined as tree data structures utilized for performing indexing on multi-dimensional information. In general, they are utilized for indexing database records for speeding the access time. For the current scenario, the storage of indices is not within the database, it is stored in the memory mapped files that are located on a separate Linux cluster. Likewise, it generates a speed up to 1000 times when compared to table scans. Moreover, it has also been deployed on databases maintaining 2 billion records along with TB images (Nautical & Us Nautical, 2012). Likewise, it is operational in the Spitzer Space Telescope Heritage Archive along with the VAO image and Catalog Discovery Service. Escalating the methodologies for the PB scale data is essential and considered as the next step. These kinds of solutions tend to be more effective as compared to expensive systems such as Geographical Information Systems. Likewise, these types of systems are also very difficult to operate within the astronomical environment, where the sphere with its footprints in the sky and datasets/instruments are in general simple shapes of geometry (Feigelson & Babu, 2003).

How the Astronomers Archive the Over-Flowed Data?

Astronomers are gathering massive data ever before. It is important to identify what are the best practices that are followed for this ever growing data. Currently, astronomy is already overflowed by data as one Pera Byte (PB) of data is accessible electronically.

Research on Developing Technologies

The number of researchers is growing in the field of data archival in data centers. The aim is to reduce cost associated with computation and financial factor. A research was conducted for the applicability of graphical processing units (GPU) for astronomy. Likewise, these GPU’s are utilized for enhancing the image output on the display device. Moreover, these GPU’s incorporates numerous floating point processors. These researchers figured out that the GPU’s are compatible with single precision calculations instead of double precision calculation that is what actually required in astronomy. Similarly, the data transfer is also to a limited threshold from and to the GPU. One of the other researches is about cloud computing in astronomy. For the commercial clouds, best suited applications incorporate memory-intensive processing that provides competitive advantage for the low cost processing.


The astronomy industry has started to create larger amount of data that can be managed, served or processed with the help of latest IT techniques. In this essay, the tools and techniques needed for next-generation are developed. Moreover, data archival embedded with new technological changes, enterprises between astronomers and computer scientists along with training programs conducted in high level software engineering are developed.


Feigelson, E. D., & Babu, G. J. (2003). Statistical challenges of astronomy Springer.

Jaschek, C. (1989). Data in astronomy Cambridge University Press.

Nautical, A. O., & Us Nautical, A. O. (2012). Astronomical almanac for the year 2013 and its companion, the astronomical almanac online Bernan Assoc.