Apache OODT

Apache OODT
	Apache Software Foundation
Stable release	1.9.1 / October 3, 2021; 2 years ago
Apache License 2.0
Website	oodt.apache.org

The Apache Object Oriented Data Technology (OODT) is an open source

NASA Jet Propulsion Laboratory

to support capturing, processing and sharing of data for NASA's scientific archives.

History

The project started out as an internal

NASA Jet Propulsion Laboratory project incepted by Daniel J. Crichton, Sean Kelly and Steve Hughes. The early focus of the effort was on information integration and search using XML as described in Crichton et al.'s paper in the CODATA meeting in 2000.^[2]

After deploying OODT to the

EDRN or Early Detection Research Network project, OODT in 2005 moved into the era of large scale data processing and management via NASA's Orbiting Carbon Observatory (OCO) project. OODT's role on OCO was to usher in a new data management processing framework that instead of tens of jobs per day and tens of gigabytes of data would handle 10,000 jobs per day and hundreds of terabytes of data. This required an overhaul of OODT to support these new requirements. Dr. Chris Mattmann

at NASA JPL led a team of 3-4 developers between 2005-2009 and completely re-engineered OODT to support these new requirements.

Influenced by the emerging efforts in

Hadoop which Mattmann participated in, OODT was given an overhaul making it more amenable towards Apache Software Foundation like projects. In addition, Mattmann had a close relationship with Dr. Justin Erenkrantz

, who as the Apache Software Foundation President at the time, and the idea to bring OODT to the Apache Software Foundation emerged. In 2009, Mattmann and his team received approval from NASA and from JPL to bring OODT to Apache making it the first NASA project to be stewarded by the foundation. Seven years later, the project has released a version 1.0.

Features

OODT focuses on two canonical use cases:

Big Data processing and on Information integration. Both were described in Mattmann's ICSE 2006^[3] and SMC-IT 2009^[4]

papers. It provides three core services.

File Manager

A File Manager is responsible for tracking file locations, their metadata, and for transferring files from a staging area to controlled access storage.

Workflow Manager

A Workflow Manager captures control flow and data flow for complex processes, and allows for reproducibility and the construction of scientific pipelines.

Resource Manager

A Resource Manager handles allocation of Workflow Tasks and other jobs to underlying resources, e.g., Python jobs go to nodes with Python installed on them; jobs that require a large disk or CPU are properly sent to those nodes that fulfill those requirements.

In addition to the three core services, OODT provides three client-oriented frameworks that build on these services.

File Crawler

A file Crawler automatically extracts metadata and uses Apache Tika to identify file types and ingest the associated information into the File Manager.

Catalog and Archive Crawling Framework

A Push/Pull framework acquires remote files and makes them available to the system.

Catalog and Archive Service Production Generation Executive (CAS-PGE)

A scientific algorithm wrapper (called CAS-PGE, for Catalog and Archive Service Production Generation Executive) encapsulates scientific codes and allows for their execution independent of environment, and while doing so capturing provenance, and making the algorithms easily integrated into a production system.

CAS RESTful Services

A Set of RESTful APIs which exposes the capabilities of File Manager, Workflow Manager and Resource manager components.

OPSUI Monitor Dashboard

A web application for exposing services form the underlying OODT product / workflow / resource managing Control Systems via the

JAX-RS^{[citation needed]} specification. At this stage it is built using Apache Wicket^{[citation needed}

] components.

The overall motivation for OODT's re-architecting was described in a paper in Nature (journal) in 2013 by Mattmann called A Vision for Data Science.^[5]

OODT is written in the

REST API^[6] used in other languages including Python (programming language)

.

Notable uses

OODT has been recently highlighted as contributing to NASA missions including Soil Moisture Active Passive^[7] and New Horizons.^[8] OODT also helps to power the Square Kilometre Array telescope^[9] increasing the scope of its use from Earth science, Planetary science, radio astronomy, and to other sectors. OODT is also used within bioinformatics and is a part of the Knowledgent Big Data Platform.^[10]

References

^ "[ANNOUNCE] Apache OODT 1.9.1 released". Retrieved 27 September 2022.
^ Crichton, Daniel; Hughes, John; Hyon, Jason; Kelly, Sean (2000). "Science Search and Retrieval using XML". The Second National Conference on Scientific and Technical Data, US National Committee for CODATA, National Research Council.
S2CID 7699385
.

S2CID 705732
.

PMID 23344342
.

^ "Apache OODT APIs - OODT - Apache Software Foundation". cwiki.apache.org. Retrieved 2016-06-27.

^ "Apache - The ASF on Twitter". Retrieved 2016-06-27.

^ "Apache - The ASF on Twitter". Retrieved 2016-06-27.

^ "Apache - The ASF on Twitter". Retrieved 2016-06-27.

^ "Q&A on the Advantages of OODT - Object Oriented Data Technology - Knowledgent Perspectives". 2014-07-30. Archived from the original on 2015-04-14. Retrieved 2016-06-27.

External links

http://oodt.apache.org

v
t
e
The Apache Software Foundation
Top-level
projects

Accumulo

ActiveMQ

Airavata

Airflow

Allura

Ambari

Ant

Aries

Arrow

Apache HTTP Server

APR

Avro

Axis

Axis2

Beam

Bloodhound

Brooklyn

Calcite

Camel

CarbonData

Cassandra

Cayenne

CloudStack

Cocoon

Cordova

CouchDB

cTAKES

CXF

Derby

Directory

Drill

Druid

Empire-db

Felix

Flex

Flink

Flume

FreeMarker

Geronimo

Groovy

Guacamole

Gump

Hadoop

HBase

Helix

Hive

Iceberg

Ignite

Impala

Jackrabbit

James

Jena

JMeter

Kafka

Kudu

Kylin

Lucene

Mahout

Maven

MINA

mod_perl

MyFaces

Mynewt

NiFi

NetBeans

Nutch

NuttX

OFBiz

Oozie

OpenEJB

OpenJPA

OpenNLP

OрenOffice

ORC

PDFBox

Parquet

Phoenix

POI

Pig

Pinot

Pivot

Qpid

Roller

RocketMQ

Samza

Shiro

SINGA

Sling

Solr

Spark

Storm

SpamAssassin

Struts 1

Struts 2

Subversion

Superset

SystemDS

Tapestry

Thrift

Tika

TinkerPop

Tomcat

Trafodion

Traffic Server

UIMA

Velocity

Wicket

Xalan

Xerces

XMLBeans

Yetus

ZooKeeper

Commons

BCEL

BSF

Daemon

Jelly

Logging

Incubator

Taverna

Other projects

Batik

FOP

Ivy

Log4j

Attic

Apex

AxKit

Beehive

Bluesky

iBATIS

Click

Continuum

Deltacloud

Etch

Giraph

Hama

Harmony

Jakarta

Marmotta

MXNet

ODE

River

Shale

Slide

Sqoop

Stanbol

Tuscany

Wave

XML

Licenses

Apache License

Category

Retrieved from "https://en.wikipedia.org/w/index.php?title=Apache_OODT&oldid=1184806903"

[1] "[ANNOUNCE] Apache OODT 1.9.1 released". Retrieved 27 September 2022.

[2] Crichton, Daniel; Hughes, John; Hyon, Jason; Kelly, Sean (2000). "Science Search and Retrieval using XML". The Second National Conference on Scientific and Technical Data, US National Committee for CODATA, National Research Council.

[3] S2CID 7699385
.

[4] S2CID 705732
.

[5] PMID 23344342
.

[6] "Apache OODT APIs - OODT - Apache Software Foundation". cwiki.apache.org. Retrieved 2016-06-27.

[7] "Apache - The ASF on Twitter". Retrieved 2016-06-27.

[8] "Apache - The ASF on Twitter". Retrieved 2016-06-27.

[9] "Apache - The ASF on Twitter". Retrieved 2016-06-27.

[10] "Q&A on the Advantages of OODT - Object Oriented Data Technology - Knowledgent Perspectives". 2014-07-30. Archived from the original on 2015-04-14. Retrieved 2016-06-27.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]