Storage for High Performance Computing
With ever increasing volumes of data created by new sources of information in our connected world, managing Petabytes of data has proven to be a real challenge. Leveraging our long-standing and successful experience in implementing massive data management solutions based on multiple technologies, Atos now proposes not only tailored solutions, but also Bull Director for HPSS, the first component of a range of Bull software dedicated to storage for high performance computing.
Bull Director for HPSS: smartly improve your HPSS performance >>
METEO FRANCE “extreme data at work for massive data processing” >>
University of Warwick “extreme data at work for massive data analytics” >>
As data sets with hundreds of terabytes or many petabytes of data become increasingly common-place, Hierarchical Storage Management (HSM) becomes a cornerstone of successful intensive data management systems. HSM is a data storage technique that automatically moves data between high-performance / high-cost storage devices and low-end / low-cost but more capacitive storage devices. To optimize costs, HSM systems store the bulk of data on slow, economical devices such as robotic tape libraries, and copy data to faster disk drives whenever they are needed for processing.
HPSS (High Performance Storage System) is an HSM system born from a Collaboration between IBM Global Business Services and 5 American DOE laboratories, augmented lately by the French CEA/DAM. Atos has a long experience of implementing HPSS in challenging environments where long-term data preservation and re-use of massive data sets are key. Atos developed Bull Director for HPSS to remove some HPSS bottlenecks. Bull Director for HPSS is an add-on to this HSM. It improves the global performance of HPSS for tape libraries, in an environment with many concurrent access requests, by grouping simultaneous requests from different users in real time.
What is the problem to solve?
On large-scale compute systems, applications are usually run as batch jobs, i.e. users submit resource re-quests to the system, such as a specified quantity of CPU cores and a list of specified files that are necessary to setup the run time environment, input data files and place holder for results files.
Handling 10s of Terabytes and millions of files
With Teraflops- or Petaflops-scale systems, and depending on the type of applications ran, the total dataset size and file count may range up to 10s of Terabytes and several millions of files per batch job.
Each system is usually shared by several jobs so that administrators get the maximum use out of the machine (as close as possible to 100%), hence putting pressure on the HSM to bring up the right environment or push it to a lower tier, as fast as possible. At run time, these large datasets are stored on fast disks, allowing fast turnarounds. Then they must be moved to cheaper and more capacitive media once applications have completed their tasks. This means the HSM may suddenly, at job start and stop time, have to manage concurrently millions of file migration requests to free space on fast disks, for each of the concurrent jobs. The bottom line can easily be several batch recall requests with 10000+ files each being sent to HPSS. These are going to be processed sequentially, leaving end users no other option than waiting for the completion of previous recalls (minutes, hours).
Bull Director for HPSS removes the recall queue bottleneck.
Weather forecast and climate prediction improvement are a priority for METEO FRANCE. Their aim is to accelerate the prediction and location of severe phenomena and anticipate climate change and its impacts. The recent improvements are the result of more observations and more accurate models and required a new generation of world-class super computers and a new architecture capable to manage and compute massive data sets.
The University of Warwick has developed, in partnership with Atos, a project, dedicated to urban Science to evaluate all the opportunities to enhance quality and performance of urban services. The University of Warwick has a global partnership with the Center for Urban Science and Progress in New York and works with other prestigious universities including the New York University, the City University of New York, the Carnegie Mellon University, the University of Toronto and the IIT Mumbai. The University of Warwick is partnering with Atos to deliver a unique Big Data capability to support large-scale data processing and analytics and provide new urban services from data-driven real-Estate valuation, parking analysis for urban planning, analysis of citizen motion, disease spread control to traffic management, cooperative navigation, road quality evaluation, predictive analysis for vehicle maintenance, cybersecurity. Other services are being evaluated. Atos has delivered the hardware platform, based on it HPC bullx servers and works with the University of Warwick to deliver high-volume data analytics and streaming analysis for high-velocity data and are currently evaluating the use of accelerators and in-memory analytics services.