The Boulder Earth and Space Science Informatics Group (BESSIG) aims to galvanize and support networking and collaboration among Earth and Space Science data users, data providers, data managers, and middleware providers, especially those in the Boulder, Colorado area. Topical areas include issues of scientific data representation, management, discovery, access, analysis, visualization, citation, transparency, and the infrastructure to support those efforts. The end goal is to improve the usage and thus the value of scientific data, thereby improving our understanding of our Earth and its systems.
We continually seek topics and speakers of interest. If you have an idea for a relevant topic you would like to see presented, please contact bessig dot info at lasp dot colorado dot edu to discuss setting up a presentation.
Our next meeting is Monday, February 22, at 4:30 p.m.
We’ll be meeting at Alfalfa’s in the Community Room, located at 1651 Broadway.
Monday, February 22nd, 4:30 – 6:00 PM
Evaluating the Readiness of Scientific Software, Starting in the BESSIG Community
- Soren Scott, The Ronin Institute for Independent Scholarship
Science and research code is currently a hot topic. We see ongoing discussions around citations and credit for writing science software, sustainability and transparency of such software, and reproducibility of scientific results, which generally means runnable code.
One aspect of those discussions involves the evaluation of science code for recommended practices and for evaluating understanding project maturity as projects progress from prototype to operational systems. The ESIP Products & Services committee has begun this discussion through the recently initiated, NASA-supported AIST TRL Evaluation project, which aims to determine independent criteria for evaluating the technical readiness of a project, including its software.
With that as a starting point, and leveraging the work of the Software Sustainability Institute, this meeting will be not just a discussion, but a brainstorming session on science software evaluation in the BESSIG community. The session will be led by Soren Scott. We especially want active science/research software developers to participate. If you develop scientific software, please come.
Possible topics include:
- What are basic software practices that every project should use?
- What additional practices are applicable to higher readiness or maturity levels?
- Is software maturity the same as software readiness?
- What would a good science/research code linter look like? (What is Linting?)
- How do these practices align with generally accepted good code practices?
- How do you measure progress?
- What matters to you in software evaluation and what kind feedback from such a process would help you?
As background material, Soren has prepared some Technology Evaluation Resources, a set of resources and working products related to technology evaluation efforts through ESIP and BESSIG communities. Thanks, Soren! Also, if this topic moves you, consider joining the ESIP cluster (it’s free!) and helping to move this effort forward.
4:30pm – 5:30pm Presentation and discussion
5:30pm – 6:00pm Social
PAST BESSIG MEETINGS
Wednesday, November 30th, 5:00 – 6:30 PM
Dynamic Data Citations: The Current State
- Ruth Duerr, The Ronin Institute for Independent Scholarship
“Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.”
Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 [https://www.force11.org/datacitation].
Citation of research data is becoming the norm. Government agencies like OSTP, NSF, NASA and NOAA require data to be fully and openly accessible, and have articulated data citation as a means to achieving open and equal access to data to all interested parties. Also, publishers are requiring manuscript submitters to provide access to the data sets used in their published research in order to link publications with the data that underly them. Citation of research data is a key component of these goals. Principles of data citation have reached some degree of maturity, but significant issues remain. In particular, how does one cite data that is dynamic, e.g., created via subsetting, aggregation, or other on-demand, server side functionality?
The Research Data Alliance (RDA) Working Group on data citation has been working in this area, and the Federation of Earth Science Information Partners (ESIP Federation) held a workshop on Dynamic Data Citation this year. In this talk, Ruth will describe the the current state of research data citation and the results of these recent efforts.
As a data scientist/engineer, Ruth Duerr has been interested in four fields of inquiry: science data management, digital archives management, records management, and digital library science. All four fields are developing separately, yet share similar problem spaces—how to make available and preserve digital data and information over time. Her research interests involve nearly all aspects of data stewardship. She currently is a PI and/or Project Manager for several ongoing and recent data management and cyberinfrastructure projects funded by NSF, NASA and NOAA.
Wednesday, September 30th, 4:00 – 6:00 PM
Rescuing Vanishing Heritage Data: Why, How, Who, Where, & When? A brief review
- Dr. Elizabeth Griffin, Dominion Astrophysical Observatory, Victoria, Canada
Data Rescue has never been more relevant than today, when quantitative information on matters pertaining to long-term climate change and other environmental properties of Planet Earth is sorely needed. Yet progress is seriously patchy. We review what is driving current activities, the status of group efforts, the hurdles, and the outlook.
Dr. Elizabeth Griffin is a research astrophysicist, educated in the UK and carrying out research in Cambridge (UK) for 30 years, then in Oxford, and is now continuing those projects at the Dominion Astrophysical Observatory (Victoria, Canada). Early experience with photographic recording of data (before computers and CCDs were on the horizon) encouraged her to pioneer the rescue (through programs of digitization) of the information on many of Astronomy’s plates regarding stellar variability, and from there to team up with other similar projects in a diversity of scientific fields. She co-Chairs (with David Gallaher) an international WG for Data Rescue, and meets all types of reactions to Data Rescue, from hugely supportive to downright skeptical.
Following Elizabeth’s presentation to the BESSIG, there will be a group discussion of opportunities and logistics for the workshop. As it offers the chance to bring data programs together from across the Front Range, we would like to identify workshop organizing committee members and workshop partners who can take the lead on planning and supporting a fantastic international event. Please bring your great ideas and data rescue experiences to share in this exciting conversation!
Monday, September 21st, 4:30 – 6:00 PM
Ontology Reuse in Geoscience Semantic Applications
- Matt Mayernik, Project Scientist and Research Data Services Specialist, UCAR Library
Interplays between local ontology development and the establishment of wider ontology connections are fundamental to the Semantic web. This presentation will discuss the goals and work of the EarthCollab project, focusing on ontology selection, consolidation, and reuse driven by geoscience use cases. The EarthCollab project is a collaboration between UCAR, Cornell University, and UNAVCO to leverage semantic technologies to manage and link geoscientific information and resources. We are using the VIVO semantic web software suite to support the discovery of information, data, and potential collaborators within the geodesy and polar science communities.
We will present our ontology design approach, which is heavily emphasizing ontology reuse, and how the different needs of each use case have informed our ontology selection and design. In specific, we will discuss our approach to bringing together the VIVO-Integrated Semantic Framework (VIVO-ISF) ontology, the Global Change Information System (GCIS) ontology, and the Data Catalog (DCAT) ontology, among others, using the VIVO application. We are interested in engaging the BESSIG community in discussions around 1) key decision points for new semantic web applications in deciding when to reuse existing ontologies and when to develop original ontologies, 2) the benefits and drawbacks of using and expanding existing ontologies.
Monday, August 31st, 4:00 – 6:00 PM
Leveraging Internet Identity for Scientific Collaborations
- Ken Klingenstein, Evangelist, Digital Identity & Privacy, Internet2
In the last several years there has been rapid development of an identity layer for the Internet. Efforts in government, R&E, businesses and among social identity providers are creating an infrastructure of identity and attributes that is being leveraged to access supercomputers, social sites, health care providers, federal research agencies, instrumentation and databases, cloud based storage and compute services, etc.
The two major areas in this work are federated identity, which allows local identities, authentication and attributes to be used Internet-wide, and collaboration platforms, which allow virtual organizations and other multi-institutional efforts to build on federated identity and seamlessly use a growing pool of collaboration applications (wikis, listservs, file sharing, code management tools, command line apps, etc).
This talk will discuss the current state of federated identity, including international inter-federation and US government activities, and how federated identities are being used in leading-edge US science communities. It will then present the emergence of collaboration platforms, and their ability to integrate access control and group management across collaboration applications using open standards. Demos might happen; interruptions and comments most welcome.
Thursday, July 9th, 4:00 – 5:30 PM
Identifiers and Relationships
- Joe Hourcle, Goddard Space Flight Center
At a recent meeting, I came to realize that there are quite a few people who have more recently come into the field of data informatics, and have missed out on much of the discussions over the last decade on data identifiers. In the last few weeks two papers were published by some of the same co-authors that took a contrary position on the presentation of identifiers, although that was not a focus on either of the papers.
I will give an overview of some of the issues regarding identifiers for data (both those that I think are resolved and not), the need for vocabulary and standards to describe what is being identified, and the implications for data citation and describing other data relationships.
Joe Hourcle is a programmer/analyst for the Solar Data Analysis Center at Goddard Space Flight Center, working as a programmer / DBA / sysadmin / cataloger / whatever else on the Virtual Solar Observatory. He has an interest in classifying things and naming concepts — he has been working with Todd King on a (still unpublished) vocabulary to discuss data systems (http://virtualsolar.org/vocab), and back before he knew anything about ontologies & controlled vocabularies, added the topics to fark.com. He would also like to remind you that the crew neck means that most t-shirts qualify as a ‘shirt with a collar’.
Monday, Oct. 6th, 4:30 – 6 PM
What’s an Ontology and What Should I Do With It?
- Beth Huffer, Lingua Logica
The word “ontology” is used to refer to a variety of different artifacts, from controlled vocabularies that serve as glossaries, to formal ontologies that serve as data schemas for graph databases and/or deductive reasoning systems. This talk will focus on use cases for formal ontologies, with a demonstration of and presentation on ODISEES (Ontology-Driven Interactive Search Environment for Earth Science) which was recently released in beta by the Atmospheric Science Data Center at NASA Langley Research Center. ODISEES provides a parameter-level search environment for discovering ASDC data resources, enabling users to specify a precise set of criteria and get a set of results that exactly match those criteria. Following an overview of the technology behind ODISEES, Beth will discuss additional use cases for formal ontologies of the sort driving ODISEES.
Wednesday, May 28th, 3:30 – 5 PM
Who’s Afraid of File Format Obsolescence? Evaluating File Format Endangerment Levels and Factors for the Creation of a File Format Endangerment Index
- Heather Ryan, University of Denver Library and Information Science
Much digital preservation research has been built on the assumption that file format obsolescence poses a great risk to the continued access of digital content. In an endeavor to address this risk, a number of researchers created lists of factors that could be used to assess risks associated with digital file formats. My research examines these assumptions about file format obsolescence and file format evaluation factors with the aim of creating a simplified file format endangerment index.
This study examines file format risk under the new lens of ‘file format endangerment,’ or the possibility that information stored in a particular file format will not be interpretable or renderable in human accessible means within a certain timeframe. Using the Delphi method in two separate studies, this exploratory research collected expert opinion on file format endangerment levels of 50 test file formats; and collected expert opinion on relevance of 28 factors as causal indicators of file format endangerment.
Experts expressed the belief that generally, digital information encoded in the rated file formats will be accessible for 20 years or more. This indicates that file format experts believe that there is not a great deal of short-term risk associated with encoding information in the rated file formats, though this does not preclude continued engagement with preservation activities for these and other file formats. Furthermore, the findings show that only three of the dozens of file format evaluation factors discussed in the literature exceeded an emergent threshold level as causal indicators of file format endangerment: ‘Rendering Software Available,’ ‘Specifications Available,’ and ‘Community/3rd Party Support.’ Consequently, these factors are ideal candidates for use in a simple file format endangerment index that can be used to assess endangerment levels of any file format.
The findings of this study have implications for further exploration of file format endangerment in specific digital information creation domains. In particular,
applying this model to file formats created by and used in the Earth and Space Science communities will both strengthen the model and will produce valuable insight into format-centric Earth and Space Science data creation and management practices. This insight can then be applied to risk assessment and subsequent actions to support continued access to datasets over time.
Wednesday, April 16th, 4:15 – 6 PM
An Easy Bake Semantic Metadata Repository for Scientific Data
Note that we’ll start at 4:15 this month due to our speaker’s schedule.
- Mik Cox, Tyler Traver, Anne Wilson, Doug Lindholm, Laboratory for Atmospheric and Space Physics (LASP), Don Elsborg, CU Faculty Affairs
This presentation will discuss the use of open source tools and the tasks that remained to create a semantically enabled metadata repository.
The LASP Interactive Solar Irradiance Data Center, LISIRD, is a web site that serves the lab’s solar irradiance and related data products to the public. LISIRD provides information about the data it offers as part of its web page content, embedded in static HTML. At the same time, other LASP web sites also provide the same information, such as sites pertaining to specific missions or education and outreach. Keeping data set information updated and in sync across web sites is a problem. Nor is the information interoperable with emerging search and discovery tools.
To address this and other issues, we created a semantically enabled metadata repository that holds information about our data. In conjunction, we prototyped a new implementation of LISIRD that dynamically renders page content, pulling metadata from the repository and including in the page current, vetted metadata from a single, definitive source. Other web pages can similarly pull this information if they choose. Additionally we can now offer new semantic browse and search capabilities, such as search of data sets by type (currently spectral solar irradiance, total solar irradiance, and solar indices) or over a particular spectral range provided by the user.
We can also render the metadata in various formats understandable to other communities, such as SPASE for the heliophysics community and ISO for the international community. This will allow us to federate with sites that use those formats, allowing broader discovery of our data.
To date, metadata management at LASP has generally been done on a per project, ad hoc basis. We are building applications on top of the repository that provide CRUD (create, read, update, delete) capabilities for metadata records to metadata ‘owners’ and ‘curators’. We expect this to help data managers to store and manage their metadata in a more rigorous fashion should they choose to use it.
With these tools and some student time (though our students are exceptional) we are achieving significantly increased capabilities at a relatively low cost. We believe this tool combination could help projects with limited resources achieve similar capabilities to manage and provide access to metadata.
And, if that’s not easy-bake enough for you, try this PC EZ-Bake Oven, made especially for geeks: http://www.thinkgeek.com/stuff/41/ezbake.shtml.
Tuesday, March 18th, 4:00 – 6 PM
Earth System CoG and the Earth System Grid Federation: A Partnership for Improved Data Management and Project Coordination
- Sylvia Murphy, Luca Cinquini, Cecelia DeLuca, Allyn Treshansky, NOAA/CIRES
The Earth System CoG Collaboration Environment, led by a NOAA ESRL/CIRES team, is partnering with the DOE-led Earth System Grid Federation (ESGF) data archive to deliver a capability that will enable users to store, federate, and search scientific datasets, and manage and connect the projects that produced those datasets.
ESGF is an international network of data nodes that is used to host climate data sets, including the model outputs from the Coupled Model Intercomparison Project (CMIP), which supported the Intergovernmental Panel on Climate Change (IPCC) assessment reports. ESGF data nodes are federated, so that all data holdings are visible from any of the installation sites. An ESGF data node is now installed at NOAA’s Earth System Research Laboratory (ESRL’s). It currently hosts data from the Dynamical Core Model Intercomparison Project (DCMIP) and Twentieth Century Reanalysis data from ESRL’s Physical Sciences Division.
CoG is a collaboration environment and connective hub for networks of projects in the Earth Sciences. It hosts software development projects, model intercomparison projects, and short university-level courses. It includes a configurable search to data on any ESGF node, metadata collection and display, project-level wikis, and a host of other capabilities. There are 74 projects currently using the system.
CoG is partnering with the international Earth System Model Documentation (ES-DOC) project, funded by both NOAA and the EU’s Infrastructure for the European Network for Earth System Modeling (IS-ENES) project. ES-DOC is developing tools that capture, display, and compare Earth system model metadata. This information can be linked directly from a CoG project or attached to specific datasets in the ESGF node.
This presentation will provide an overview of both CoG and ESGF, demonstrate data discovery and download, and key CoG capabilities using relevant example projects.
Wednesday, February 19th, 4:00 – 6 PM
Accessing Data Instead of Ordering Data: A New Normal
- Michael Little, Advanced Development Systems Engineer at the Atmospheric Science Data Center (ASDC)
Mike will describe how the new generation of research objectives will need to avoid staging data locally from multiple modeling and observational repositories. Rather, new access methods will present a machine-to-machine interface which permits codes and software applications to retrieve small increments of data continuously as part of the processing.
The ASDC’s Data Acess architecture will be described with a particular emphasis on iRODS as one of the most promising tools for remote access to data held in earth science data centers.
Wednesday, January 22nd, 4:15 – 6 PM
Deep Carbon Observatory – Data Science and Data Management Infrastructure Overview and Demonstration
- Patrick West, Rensselaer Polytechnic Institute
The Deep Carbon Observatory (DCO) brings together hundreds of organizations and individuals from all around the world, spanning a great many scientific domains with a focus on Carbon. The DCO Data Science team is anticipating the generation of terabytes of information in the form of documents, scientific datasets from level 0 to data products and visualizations, information about events, people, and organizations, and more. So how do we keep track of all of this information, manage the information, and disseminate the information?
In order to organize all of this information and provide the research community the tools necessary to collaborate and do their research, the DCO Data Science team is putting together a suite of tools that will integrate all of these components in a seamless, distributed, heterogeneous environment. This presentation and demonstration will provide an overview of the work that we, the DCO Data Science team, are doing to provide such an environment.
Wednesday, November 20, 4 – 6 PM
Improving Science with Open Formats and High-Level Languages: Python and HDF5
- Andrew Collette, Laboratory for Atmospheric and Space Physics (LASP)
This talk explores how researchers can use the scalable, self-describing HDF5 data format together with the Python programming language to improve the analysis pipeline, easily archive and share large datasets, and improve confidence in scientific results. The discussion will focus on real-world applications of HDF5 in experimental physics at two multimillion-dollar research facilities: the Large Plasma Device at UCLA, and the NASA-funded hypervelocity dust accelerator at CU Boulder. This event coincides with the launch of a new O’Reilly book, Python and HDF5: Unlocking Scientific Data, complimentary copies of which will be available for attendees.
As scientific datasets grow from gigabytes to terabytes and beyond, the use of standard formats for data storage and communication becomes critical. HDF5, the most recent version of the Hierarchical Data Format originally developed at the National Center for Supercomputing Applications (NCSA), has rapidly emerged as the mechanism of choice for storing and sharing large datasets. At the same time, many researchers who routinely deal with large numerical datasets have been drawn to the Python by its ease of use and rapid development capabilities.
Over the past several years, Python has emerged as a credible alternative to scientific analysis environments like IDL or MATLAB. In addition to stable core packages for handling numerical arrays, analysis, and plotting, the Python ecosystem provides a huge selection of more specialized software, reducing the amount of work necessary to write scientific code while also increasing the quality of results. Python’s excellent support for standard data formats allows scientists to interact seamlessly with colleagues using other platforms.
Wednesday, October 23, 4 – 6 PM
There is more to conservative interpolation—interpolating edge and face centered fields in the geo-sciences
Regridding of data is a common problem faced by many scientific software developers. If regridding is part of your world, this talk may be of interest to you.
- Alexander Pletzer, Tech-X
Interpolation is one of the most widely used postprocessing tasks, according to a survey of Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT) users. Most geo-postprocessing tools (UV-CDAT, NCL, Ferret, etc) support a choice of both bilinear and conservative regridding with conservative interpolation guaranteeing that the total amount of “stuff” (energy, water, etc) remains unchanged after regridding. The SCRIP and ESMF are examples of libraries implementing these interpolation methods.
We argue that the type of interpolation is dictated by the type of field and that cell centered fields require conservative interpolation whereas nodal fields require bilinear (or higher order) interpolation. Moreover, the wind velocity fields used by finite-volume atmospheric codes, which are neither cell-centered nor nodal but face-centered (Arakawa D staggering), require different interpolation formulas. Interpolation formulas of face-centered and edge-centered (Arakawa C) fields have been known as Whittney forms since 1957 and are widely used in electromagnetics. We present interpolation methods new to the geo-sciences that conserve flux and line integrals for Arakawa D, respectively Arakawa C, stagggered fields.
This talk should be of interest to anybody in need to regrid velocity and other vector fields whose components are staggered with respect to each other.
Wednesday, September 18, 4 – 6 PM
Strategies, motivations, and influencing adoption of testing for scientific code
“Code without tests is bad code. It doesn’t matter how well written it is; it doesn’t matter how pretty or object-oriented or well-encapsulated it is. With tests, we can change the behavior of our code quickly and verifiably. Without them, we really don’t know if our code is getting better or worse.” -Michael C. Feathers, “Working Effectively with Legacy Code”
A strong statement, but it does bring home the vital role of testing in software development.
- Ian Truslove, Erik Jasiak, NSIDC
Computation and programming are increasingly inescapable in modern Earth Sciences, but scientists and researchers receive little or no formal software engineering or programming training. At the same time, research into the reproducibility of other academic papers exposing disappointingly low rates of repeatability and high-profile retractions due to computational or data errors increase the onus on researchers to write repeatable, reliable, even reusable programs; in other words, “write better code”.
Software engineering has plenty to say on the matter of “better code”: metrics, methodologies, processes, tools… Of course, none are indisputable and none provide absolute guarantees. One seemingly obvious technique – testing – has enjoyed a renaissance in incarnations such as unit testing, and with approaches such as test-driven development (TDD) and behavior-driven development (BDD).
Based on our experience at the National Snow and Ice Data Center (NSIDC) with unit testing, TDD and BDD, we present a set of recommendations to scientific and research programmers about some techniques to try in their day to day programming, and possibly provide some inspiration to aim for more comprehensive approaches such as BDD. We will highlight some use cases of various types of testing at the NSIDC, discuss some of the cultural and management changes that occurred for programmers, scientists and project managers to consider and adopt processes such as TDD, make recommendations about how to introduce or expand rigorous code testing practices in your organization, and discuss the likely benefits in doing so.
Wednesday, August 21, 4 – 6 PM
The Research Data Alliance: Creating the culture and technology for an international data infrastructure
- Mark Parsons, Managing Director, Research Data Alliance/U.S.
All of society’s grand challenges—be it addressing rapid climate change, curing cancer and other disease, providing food and water for more than seven billion people, understanding the origins of the universe or the mind—all of them require diverse and sometimes very large data to to be shared and integrated across cultures, scales, and technologies. This requires a new form and new conception of infrastructure. The Research Data Alliance (RDA) is creating and implementing this new data infrastructure. It is building the connections that make data work across social and technical barriers.
RDA launched in March 2013 as a international alliance of researchers, data scientists, and organizations to build these connections and infrastructure to accelerate data-driven innovation. RDA facilitates research data sharing, use, re-use, discoverability, and standards harmonization through the development and adoption of technologies, policy, practice, standards, and other deliverables. We do this through focussed Working Groups, exploratory Interest Groups, and a broad, committed membership of individuals and organizations dedicated to improving data exchange.
What data sharing problem are you trying to solve? Find out how RDA can help.
Wednesday, July 24, 4 – 6 PM
HDF and The Earth Science Platform
- Ted Habermann, The HDF Group
Interoperable data and understanding across the Earth Science community requires convergence towards a standard set of data formats and services, metadata standards, and conventions for effective use of both. Although large legacy archives still exist in netCDF3, HDF4, and many custom formats, we have achieved considerable convergence in the data format layer with the merger of the netCDF4 and HDF5 formats. The way forward seems clear as more groups in many disciplines join the HDF5 community. The data service layer has experienced similar convergence as OGC Service Standards are adopted and used in increasing numbers and connections across former chasms are deployed (ncWMS, ncSOS, netCDF/CF as OGC Standards). Many data providers around the world are in the process of converging towards ISO Standards for documenting data and services. Connections are also helping here (ncISO). Many groups are now working towards convergence in the conventions layer. The HDF-EOS and Climate-Forecast conventions have been used successfully for many datasets spanning many Earth Science disciplines. These two sets of conventions reflect different histories and approaches that provide a rich set of lessons learned as we move forward.
Wednesday, June 19, 4 – 6 PM
Py in the Sky: IPython and other tools for scientific computing
- Monte Lunacek, Application Specialist, CU Research Computing
- Roland Viger, Research Geographer, USGS
Python offers a rich toolkit that is useful for scientific computing. In this talk, we will introduce the IPython package and discuss three useful components: the interactive shell, the web-based notebook, and the parallel interface. We will also demonstrate a few concepts from the Pandas data analysis package and, time permitting, offer a few tips on how to profile and effortlessly speedup your python code. This talk will describe and illustrate these tools with example code. If Python is not your favorite programming language, this overview might change that.
Tuesday, May 21, 4 – 6 PM
NOAA Earth Information Services and TerraViz
- Eric Hackathorn, Julien Lynge, and Jeff Smith, TerraViz, NOAA
- Jebb Stewart, Chris MacDermaid, NEIS, NOAA
The NOAA Earth Information Services (NEIS) is a framework of layered services designed to help the discovery, access, understanding, and visualization of data from the past, present, and future. It includes a visualization component named TerraViz that is a multi-platform tool, running on desktops, web browsers, and mobile devices. The goal is to ingest “big data” and convert that information into efficient formats for real-time visualization. Designed for a world where everything is in motion, NEIS and TerraViz allow fluid data integration and interaction across 4D time and space, providing a tool for everything NOAA does and the people NOAA affects.
TerraViz is built using the Unity game engine. While a game engine may seem a strange choice for data visualizations, our philosophy is to take advantage of existing technology whenever possible. Video games are a multibillion-dollar industry, and are quite simply the most powerful tools for pushing millions of points of data to the user in real-time. Our presentation illustrated displaying environmental data in TerraViz at a global scale, visualizing regional data in “scenes” such as the flooding of the Washington DC area or rotating a coastal ecosystem in three axes, and developing environmental simulations/games like exploring the ocean floor in a submarine. The NEIS backend similarly takes lessons from private industry, using Apache Solr and other open source technologies to allow faceted search of NOAA data, much as sites like Amazon and Netflix do.
We believe that to have an impact on society, data should be easy to find, access, visualize, and understand. NEIS simplifies and abstracts searching, connectivity, and different data formats, allowing users to concentrate on the data and science.
Please contact us if you want to explore including your environmental data within NEIS/TerraViz or if you want to talk to us about developing custom visualizations or educational simulations to showcase your important data.
NOAA/Earth System Research Lab/Global Systems Division, Boulder, Colorado
Wednesday, April 17, 4 – 6 PM
- Chris Lynnes, Chief Systems Engineer, Goddard DAAC, NASA,“The Earth Science Collaboratory”
The Earth Science Collaboratory is a proposed framework for supporting
the sharing within the Earth science community of data, tools, analysis
methods, and results, plus all the contextual knowledge that go with
these artifacts. The likely benefits include:
- Access to expert knowledge about how to work with data safely and
- Full reprocability of results
- Efficient collaboration within multi-disciplinary and/or
geographically distributed teams
- A social network to bring together researchers and data users
with common interests
Currently, there are some nascent efforts to construct such a collaboratory. However, by its very (inclusive) nature, this construction is likely to be most successful as an emergent process, evolving from many point-to-point connections to an eventual ecosystem of cooperating components supporting collaboration.
In particular, the project seeks potential users of such a collaboratory. If this tool sounds interesting to you and you would like to be involved in its design, or you know of someone that might be interested, please spread the word. Tools like this may be significant in doing science in the future. Students and early career researchers are especially encouraged to participate.
Wednesday, March 20, 4 – 6 PM
- Doug Lindholm, LASP, “LaTiS: a data model, an API, a web service AND a floor wax”
LaTiS is a data model, a data analysis API, and a REST-ful web service for accessing scientific data via a common interface.
The LaTiS data model provides a scientific domain independent, unifying, mathematical foundation for describing datasets that captures the functional relationships between parameters. The Scala implementation of this model provides an API for reading data directly from their native source, the ability to compute with high level abstractions appropriate for the task at hand, and options for filtering, transforming, and writing data in various formats.
This talk will discuss how these capabilities are used to enable a modular web service framework that can easily be installed and configured by a data provider, and that allows users to dynamically reformat a dataset, including its time representation, storage format, missing values, etc.
This talk will be a preview (i.e. beta release) of the talk I will give at UCAR Software Engineering Assembly Conference in April.
Wednesday, February 13, 4 – 6 PM
- Beth Huffer, Lingua Logica, “ODISEES: An Ontology-Driven Interactive Search Environment for Earth Sciences”
As part of an on-going effort at NASA Langley’s Atmospheric Science Data Center, and in cooperation with the Computational & Information Sciences & Technology Office at the Goddard Space Flight Center, we have developed a semi-automated method for finding and comparing equivalent data and climate model output variables across disparate datasets. We will demonstrate an ontology-driven variable matching service that provides an automated mapping among comparable variables from multiple data products and climate model output products. The interactive user interface is driven by a queriable ontological model of the essential characteristics of data and climate model output variables, the products they occur in, the atmospheric parameters represented in the data, and the instruments and techniques used to measure or model the parameters. Queries of the ontology and triple store are used to match comparable variables by enabling users to search for those that share a user-specified set of essential characteristics.
The application addresses an emerging need among Earth scientists to compare climate model outputs to other models and to satellite observations, and addresses some of the barriers that currently make such comparisons difficult. In particular, the application
- Eliminates the need for users to be familiar with the multiple data vocabularies and standards that exist within the Earth sciences community; and
- With a few mouse clicks, provides ready access to the information needed by scientists to understand the similarities and differences between two or more data or climate model products, enabling them to quickly determine which products best suit their requirements.
Wednesday, January 16, 4 – 6 PM
- Stephen Williams, Office of Faculty Affairs, CU Boulder, “VIVO, VITRO, DataStar, and Beyond – The VIVO Project“
The VIVO project was started at Cornell University in 2003 as a faculty profiling system for Mann Library. The profiling system that is VIVO was designed in two parts, VITRO the semantic engine that is ontology agnostic and VIVO the ontology specific pages and data for presenting faculty profiles. This concept of a two tied system was taken into the third tier with location specific changes (Cornell and CU-Boulder) and ontologies that build upon VIVO (data star). This talk will focus on the VIVO project as a whole, its history, its ancillary projects, and its future. We’ll also try to cover difficulties and lessons in semantic programming and the experiences of building ETL tools for semantic data.
Wednesday, October 10, 4 – 6 PM
This month we are delighted to have representatives from law, government and science come together to discuss various aspects of science policy. We’ve asked them to consider questions like these:
- What does “science policy” mean to you? To your organization? What impact does it have?
- What are the roles in science policy and what impacts do they have? Who are the main players?
- How have you or your organization tried to impact science policy? What worked and what did not work? What did you learn?
- How does one prepare for a science policy discussion? Any do’s and don’ts?
- Scientists and engineers are trained to think and communicate in certain ways. Should those same skills be applied to policy discussions?
- If someone wanted to move more heavily into science policy, how would you advise them? What career moves would be good? Any bad career moves?
- Peter Backlund,
Director, NCAR External Relations and the Integrated Science Program
Director, Research Relations, NCAR
- Dan Baker,
Professor of Astrophysical and Planetary Sciences
Director, Laboratory for Atmospheric and Space Physics
- Alice Madden,
Wirth Chair in Sustainable Development, UC Denver
Colorado House Representative (2001 – 2010), Majority Leader (2004 – 2008)
Climate Change Adviser, Deputy Chief of Staff for Gov. Ritter
Senior Fellow on Climate Change, Center for American Progress
- Andy Schultheiss,
District Director at Office of Congressman Jared Polis
Campaigns Director at League of Conservation Voters
Boulder City Council (2003 – 2007)
The discussion will be available via Web Ex, info to follow.
Wednesday, September 19, 4 – 6 PM
- Anna Milan, NOAA/NESDIS/NGDC “Metadata for the Archive: Transition to ISO, Approaches, Challenges, and Opportunities“
- Dave Fulker, President, OPeNDAP, Inc., “A (Very) Rough Idea: Raster Binning and Masking Services“
Dave will sketch his idea for a new type of data query/response service built (perhaps for EarthCube) around a standardized space-time raster that has a dual function. Tentatively dubbed “Raster Binning & Masking Services” or RBinMasks, users would gain a (potentially standard) way to specify (irregular) space-time regions of interest and a (potentially standard) way to gain information about the space-time distributions of pertinent data, without—or before—retrieving actual values.
Wednesday, August 15, 4 – 6 PM
- Brian Wee, NEON, Inc., “NEON: A continental-scale research and operations platform for the environmental sciences“
As NEON, Inc.’s Chief of External Affairs, Brian is the organization’s liaison to Congress, US Federal agencies, and other scientific organizations. He also represents the informatics needs of the large-scale environmental sciences before the computer science and Federal data community. Brian joined the NEON Project Office at the American Institute of Biological Sciences in 2004 as a post-doctoral associate, then became a staff scientist before transitioning to the role of Administrative Director. Previously he worked for Andersen Consulting (now Accenture) designing and implementing IT solutions and then served as Senior Instructional Designer leading instructional design, knowledge management, business-process redesign, and web development projects.
Brian holds a Ph.D. in Ecology, Evolution, and Behavior from the University of Texas at Austin, a M.Sc. degree in Computer Science – Artificial Intelligence at Northwestern University, Evanston, IL and a B.Sc. in Information Systems and Computer Science from the National University of Singapore. His M.Sc. studies focused on designing and implementing computer augmented learning solutions for high-school classrooms and corporate training at the Institute for the Learning Sciences. His Ph.D. focused on investigating the relative effects of behavioral, physiological and landscape barriers on the genetic structure of insect populations by integrating genetic, behavioral, and GIS analyses.
Tuesday, July 24, 4 – 6 PM
- Jeff Morisette, United States Geological Survey (USGS) “Developing a common modeling framework for the Department of Interior’s North Central Climate Science Center“
This month, the Boulder Earth and Space Science Informatics Group welcomes Jeff Morisette, visiting us from USGS in Fort Collins to talk about, among other things, his experience with VisTrails.
Jeff is currently the director of the DOI North Central Climate Science Center where he manages and conducts research on how natural and cultural land management can respect the non-stationary nature of climate. A current research theme is how dynamic species distribution models can contribute to vulnerability assessment and adaptation planning.
Wednesday, June 20, 4 – 6 PM
- SiriJodha Khalsa, National Snow and Ice Data Center (NSIDC) “Modeling the Model—the Semantics of the CCSM4 Sea Ice Model“
- Don Elsborg, Laboratory for Atmospheric and Space Physics (LASP) “Applied Semantic Web Technology—A use case with Semantic Mediawiki
Wednesday, May 16, 4 – 6 PM
- Stephan Zednick, Rensselaer Polytechnic Institute (RPI) “Data Models and Ontologies, describing structure and classification“
Wednesday, April 18, 5 – 7 PM
This month we’ll review the recent UCAR data citation workshop, then make a foray into ontology and semantic-related areas.
In May and June we’ll continue with speakers on ontology and semantic-related topics. If you have experiences in this area that you are willing to share, please contact Anne.
- Matt Mayernik, NCAR Library: “UCAR Workshop Review – Bridging Data Lifecycles: Tracking Data Use via Data Citations.”
Download presentation (PPT 3.4 MB)
(Note: Many of these slides were taken from the workshop presentations posted at https://library.ucar.edu/events/bridging-data-lifecycles-tracking-data-use-data-citations. Original slide authors are noted in red text in the top left of the slides.)
- Ruth Duerr, NSIDC: “Early Experiences in Semantics.”
Download presentation (PDF 9.4 MB)