Personal tools
You are here: Home Focus Areas Information Science and Technology

Information Science and Technology

The staggering increases in the rate at which data is being produced mean that the amount of information that is effectively lost will increase just as fast, unless data analysis and information science become major undertakings of the scientific community.

Our ability to collect and generate data far exceeds our ability to extract knowledge from it. An enormous amount of research has been focused on increasing computational processing power, and scientific models and simulations have kept pace by increasing their size and complexity, generating mountains of data. Contained in all this data is an enormous amount of information, but very little of it ends up contributing to knowledge. Relatively little of the scientific community's resources have been dedicated to learning how to effectively analyze the data it produces. As a result, experiments and simulations often yield far less knowledge than could be obtained, if we could better analyze the data and extract its information content.

Information Science and Technology addresses the challenge of transforming the deluge of data into information and, ultimately, better, more predictive models. These are IS&T priorities:

  • After decades of exponential growth in power, existing computer architectures still fail to match the ability of the mammalian brain to interpret, respond to, and learn from natural sensory inputs. More fundamentally, developments in synthetic biology, biochemistry, neuroscience, or optics, show that complex computations are omnipresent in physical systems, but that they cannot always be easily described or reproduced in the context of standard computing models. IS&T research re-considers silicon based computing and all of its implications.
  • Progress in speed, resolution and quality of analysis increasingly depend on the marriage between advanced hardware and algorithms tailored to specific applications.
  • Progress in the quality of our models depends on the ability to rapidly inform them with the increasing amounts of observational data we are able to collect.
  • Where data is largely symbolic rather than numeric, algorithms must approximate the functionality of the human brain. This will come through investments in machine learning and inference, pattern recognition, image processing, speech and language processing, brain modeling, and so on. Computers can do better than they are at this problem, but generally this type of analysis assists the human in the loop of cognitive analysis and decision making; therefore, IS&T is very interested in the interaction between the researcher and the analysis tools.
  • IS&T programs will fundamentally change the way we approach the analysis of large data sets. Streaming data algorithms, for example, can implement data analysis as the data is collected or generated, and computational infrastructures can make these algorithms possible and efficient. This eliminates the need to store the data, transmit the data to and from the storage location, and do expensive computations on huge data aggregates. In addition, mathematical methods are coming into existence that move beyond the notion of compression as a post-processing step: instead of taking a large data set, performing a large-scale computation to compress it, and then throwing the original data away, we can instead measure a compressed version directly. The large data set need never exist. These ideas can be implemented in a streaming fashion, with data being compressible without ever having to have all the data in hand.
  • Efficient algorithms and computational infrastructures that would allow simulations and experiments to be essentially self-analyzing would make the simulations and experiments many times more valuable.
  • In order to match the rate of productivity to the rate of growth in computing capability a tremendous investment needs to be made in making the complexity and dynamics of the computing environment transparent to the researcher. The usability of the computing resources and the agility with which researchers can use them to innovate to meet new challenges needs to be a high priority. We need scalable, flexible, robust, and extensible software environments for the analysis and visualization of massive data streams which enables the organization, exploration, and modeling of multiple streams over any special resolution and any unit of analysis.
  • We currently generate data sets that overwhelm the capabilities of an individual or even a small team. IS&T develops infrastructure that facilitates large collaborative efforts to analyze data. This includes establishing nationally- and internationally-recognized, standardized data structures for common types of data sets organized around disciplines and problems. The next level of infrastructure is the creation of virtual organizations that motivate and facilitate collective action to assemble and analyze enormous, complex datasets.
  • Metrics become increasingly important as we seek to create meaningful abstractions from complex and large amounts of data. We need disciplined methods for characterizing features of data sets and defining a method for comparative analysis of those features.

We are not expending nearly as much effort in the analysis of data as in the generation of data. Instead, current trends are heading in the other direction. More data are being created than can be feasibly stored, and even stored datasets are becoming too large to feasibly transmit from one place to another, sometimes even within a given computational system. The result is that increasing amounts of data are simply lost, and what remains goes unanalyzed. The dictum of the effective office manager is "Never touch a piece of paper more than once." Yet our experiments and simulations are generating a data volume greater than that contained on all the paper in the Library of Congress. How many times can we expect to be able to touch each byte? IS&T focuses on analysis and matching the rapid growth in hardware capability with significant investment in effective tools to generate knowledge.

The IAS program in IS&T currently touches the following disciplines:

  • Computational Biology
  • Networks
  • Image processing
  • Global Energy Resources
  • Astrophysics and Cosmology

For more information, contact:

Katharine Chartrand

Information Science and Technology Programs

New Mexico Consortium

knc@newmexicoconsortium.org

Document Actions