The U.S. Department of Energy and Silicon Graphics Inc. have partnered to develop new storage technology for data-intensive computing.
The two will conduct research into “active storage,” an effort to shift computation and transformation of data from client computers to storage devices. Eng Lim Goh, SGI’s chief technology officer, claims the effort “holds the promise of dramatic productivity breakthroughs for a broad range of computing disciplines saddled by large data.”
The alliance, which is part of a long-term collaboration between PNNL and SGI, includes options for more than 2.5 petabytes of storage over the next two years. PNNL scientists will be able to take raw data sets stored on the file server and conduct computations to identify data signatures and patterns before the data is transferred to client systems.
“By developing methods to perform computing inside the file system, we will be able to reduce the amount of redundant data transfers, which routinely undermines productivity and lengthens the time to solution,” states Scott Studham, PNNL associate director for advanced computing.
The new file system is expected to sustain write rates in excess of 8Gbps and demonstrate single client write rates higher than 600Mbps. The system will leverage Lustre, an open source, object-oriented file system with development lead by Cluster File System Inc., with funding from the Department of Energy. Lustre is used on four of the top five supercomputers, including the PNNL cluster based on 1,900 Intel Itanium 2 processors.
SGI also plans to evaluate how the research effort can contribute to the evolution of the company’s SGI InfiniteStorage CXFS shared file systems.
“The scientific and HPC community continues to push the boundaries of physical and software computing systems by creating and consuming ever larger data structures,” states William Hurley, senior analyst at the Enterprise Storage Group’s Enterprise Application Group.
Hurley says the use of the Lustre and CXFS file system as a base Data Abstraction Layer provides common, shared access to a large data set while reducing the operating burden on distributed network and compute resources.
“The desire of many large and small organizations to achieve this operating condition for their file and relational data is high,” he says. “Only a handful of companies are delivering this type of functionality today.”
“Giving commercial organizations the ability to deploy a Data Abstraction Layer provides app developers, business analysts and IT operations staff a standardized data medium that expedites app development and reduces data errors that cause inaccurate reporting outcomes, while easing the burden on physical system resources and their management,” Hurley says.