One Million Queries per Hour TPC-H at 30 Terabytes by Sun and ParAccel

Teknoloji

23 Jun 2009

TPC-H Record Result at 30TB on Sun Fire X4540 OpenSolaris Cluster - ParAccel

A new and exceptional TPC-H result submitted today has been obtained on a cluster of 43 Sun Fire X4540
servers, each equipped  with two AMD Opteron 2356 2.3 GHz processors,
running ParAccel Analytic Database on Sun OpenSolaris 2009.06. The Sun/ParAccel Cluster achieved
a result of 1,050,556.20 QphH @30000GB with a price performance of
$2.86/QphH @30000GB.

This is an incredible World Record for both performance
and price-performance
at the largest TPC-H Scale Factor (30TB) to
date.

As of today, the only other 30TB result posted is on a single HP Superdome, powered by 64
x 1.6 GHz Itanium2 Dual Core processors running Oracle 10gR2. The HP
result is 150,960 QphH @30000GB with a price-performance of $46.69/QphH
@30000GB.

This result establishes the overall leadership of the
Sun/ParAccel/OpenSolaris cluster solution in Decision Support Systems (DSS) and Data
Warehousing.

  • The Sun Fire X4540 / ParAccel Cluster was over seven time (7x) faster than the HP
    Superdome
    and had sixteen times (16x) better price-performance. In
    addition, the total cost of the Sun/ParAccel configuration (H/W + S/W +
    3 years maintenance) is less than half of the total cost of the
    HP/Oracle configuration.

  • The Sun Fire X4540 cluster storage consisted entirely of fully mirrored
    internal drives.
    There were almost 1000 fewer disk spindles than the HP
    Superdome solution (2064 vs. 3072 disks), resulting in an enormous
    reduction of hardware logistics, at a fraction of the floor space (172
    RU vs 1120 RU).
  • This solution is one of the TPC-H new-generation DSS DBMS (column
    based, shared nothing, data compression, etc.
    ) results. It is
    noteworthy that all of the other new generation TPC-H
    submissions (at 100GB up to 3000GB) ran queries entirely from memory.
    This new result is disk based and thus establishes the leadership and
    viability of the Sun/ParAccel/OpenSolaris solution on shared nothing
    clusters for very large disk based databases — much larger than memory
    sizes realistically available even in extremely large database
    installations.
  • There are a number of new generation DBMS designed for Decision Support such as ParAccel, either currently for sale or still under development, all implemented on Linux. This result is the first
    public proof point of a new generation data warehousing product running
    on Solaris, more specifically OpenSolaris.
  • The load time of the 30TB database on the Sun/ParAccel cluster was 4
    times faster than the HP Superdome solution. For large DSS databases,
    load time is a very important factor.

Performance Landscape

ch/co/th = chips, cores, threads
$/QphH = TPC-H Price/Performance metric (smaller is better)
QphH = TPC-H Composite Metric (bigger is better)

System ch/co/th Database QphH $/QphH Price # Disks Available
  43 x Sun Fire X4275 86/344/344 PADB 1,050,566 2.86 $3,006,861 1248 06/21/09
  1 x HP Superdome 64/128/128 Oracle 150,960 46.69 $7,048,342 3072 06/18/07

Complete benchmark results may be found at the TPC benchmark
website http://www.tpc.org.

Results and Configuration Summary

Servers:

    43 X Sun Fire X4540 each with:

      2 x AMD Opteron 2356, 2.3 GHz QC processors
      64 GB memory
      48 x 500GB (7,200 RPM) internal SATA disks

    86 total processors
    344 total processor cores
    344 total processor threads

Storage:

    No external storage

Switches:

    3 x 48 port Cisco 3750 + 4 x Cisco 3750 24 port 1Gb Ethernet Switches

Software:

    Operating System: OpenSolaris 2009.06
    Database Manager: ParAccel PADB

Audited Results:

    Database Size: 30,000 GB (Scale Factor)
    TPC-H Composite: 1,050,566.20 QphH@30000GB
    Price/performance: $2.86 / QphH@30000GB
    Available: June 21, 2009
    Total 3 Year Cost: $3,006,861
    TPC-H Power: 1,326,910.40
    TPC-H Throughput: 831,758.00
    Database Load Time: ~3 Hours 29 minutes
    Storage Ratio: 32.04

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the
Transaction Processing Council (TPC) to demonstrate Data
Warehousing/Decision Support Systems (DSS). TPC-H measurements are
produced for customers to evaluate the performance of various DSS
systems. These queries and updates are executed against a standard
database under controlled conditions. Performance projections and
comparisons between different TPC-H Database sizes (100GB, 300GB,
1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific
benchmark that consists of a large number of complex queries typical
of decision support applications. It also includes some insert and
delete activity that is intended to simulate loading and purging data
from a warehouse. TPC-H measures the combined performance of a
particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H
Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the
number of GB of raw data, referred to as the scale factor). QphH@SF
is intended to summarize the ability of the system to process queries
in both single and multi user modes. The benchmark requires reporting
of price/performance, which is the ratio of QphH to total HW/SW cost
plus 3 years maintenance. A secondary metric is the storage
efficiency, which is the ratio of total configured disk space in GB
to the scale factor.

Key Technical Points

ParAccel PADB is one of a new generation of DBMS designed specifically
for Decision Support and Data Warehousing applications.The Sun Fire X4540 and
OpenSolaris2009.06 are a perfect match for the PADB solution. The Sun Fire
X4540 with its large amount of internal storage in a compact form
factor and OpenSolaris with ISM shared memory management, network
performance and powerful Dtrace performance analysis tools.
Below are the
main architectural features of the ParAccel product:

Shared Nothing Architecture

Shared nothing is the most optimal hardware architecture for highly
parallel database operations in DSS environments. The inherent divide
and conquer approach of distributing data over many nodes
proportionally reduces the amount of work each node must do and thus
has the potential for near linear scalability.

Column Based Physical Storage

Relational tables can be physically stored on disk in a row oriented
fashion, or in a column oriented fashion. In the row oriented option,
all columns of each row are stored contiguously on disk. By contrast,
the column oriented option stores all the values of each column
contiguously on disk. The choice of row store vs. column store may at
first glance seem arbitrary, but in fact has profound consequences on
the amount of I/O bandwidth, memory bandwidth and CPU requirements
necessary for processing various types of queries.

Aggressive Data Compression

There are dozens of known techniques for storing data in a manner
requiring fewer bytes than the original plain form of the data. The
techniques are referred to as data compression algorithms. ParAccel
uses several very effective data compression techniques. Compression is
beneficial for query processing in that it reduces the amount of data
that needs to be read from disk, and the amount of main memory space
needed for processing the data. Both of these characteristics lead to
query processing efficiencies and cost efficiencies.

Low Cost Servers and Interconnects

The ParAccel software does not require expensive proprietary hardware.
Shared nothing clusters of small and low cost systems can provide adequate
memory for aggressively compressed database engines to achieve
performance levels far above the levels achievable by conventional
database engines. In addition, the software does not require expensive
special networking infrastructure but instead provides excellent
performance just running on standard GbE equipment.

See Also

Disclosure Statement

TPC-H@30000GB Sun Fire X4540 1,050,566 QphH@30000GB, $2.86/QphH@30000GB,
availability 6/21/09. TPC-H, HP Integrity Superdome, 150,960 QphH
@30000GB, $46.69 / QphH @30000GB, availability 06/18/07, QphH,
$/QphH tm of Transaction Processing Performance Council (TPC). More
info www.tpc.org.

Source/Kaynak : http://blogs.sun.com/BestPerf/entry/sun_and_paraccel_have_broken

1 Response to One Million Queries per Hour TPC-H at 30 Terabytes by Sun and ParAccel

Avatar

Kim Stanick

June 24th, 2009 at 18:09

Good Summary. Thanks for your perspective!

Comment Form

Content In Different Language


Recent Comments


  • Jim Dougherty: You can fix Solaris 8 named_to_major, path_to_inst, drivers_alias errors on boot by simply installin [...]
  • psha: doesn't work [...]
  • Jiji joseph: Can you please let me know how can I get the SRMTools ? [...]
  • Sebastian: Hi, I don't think using a suite will work either. The order is also random. It is just a coincide [...]
  • Henry: Hey, I can't seem to get this working on my mac. The page down works if I put the focus on the wind [...]
  • Our Scores