英语翻译狂人，你敢来挑战吗？【low吧】

low吧关注：102贴子：1,117

3回复贴，共1页

英语翻译狂人，你敢来挑战吗？

Our concurrency control schemes are designed for a partitioned main-memory database system similar to H-Store. This section gives an overview of the relevant aspects of the H-Store design.
Traditional concurrency control comprises nearly half the
CPU instructions executed by a database engine. This
suggests that avoiding concurrency control can improve
throughput substantially. H-Store was therefore designed
explicitly to avoid this overhead.
2.1 Transactions as Stored Procedures
H-Store only supports executing transactions that have
been pre-declared as stored procedures. Each stored procedure invocation is a single transaction that must either abort
or commit before returning results to the client. Eliminating
ad-hoc transactions removes client stalls, reducing the need
for concurrency.
2.2 No Disk
Today, relatively low-end 1U servers can have up to 128
GB of RAM, which gives a data center rack of 40 servers
an aggregate RAM capacity of over 5 TB. Thus, a mod-
est amount of hardware can hold all but the largest OLTP
databases in memory, eliminating the need for disk access
during normal operation. This eliminates disk stalls, which
further reduces the need for concurrent transactions.
Traditional databases rely on disk to provide durability.
However, mission critical OLTP applications need high availability which means they use replicated systems. H-Store
takes advantage of replication for durability, as well as high
availability. A transaction is committed when it has been
received by k > 1 replicas, where k is a tuning parameter.
2.3 Partitioning
Without client and disk stalls, H-Store simply executes
transactions from beginning to completion in a single thread.
To take advantage of multiple physical machines and multiple CPUs, the data must be divided into separate partitions. Each partition executes transactions independently.
The challenge becomes dividing the application’s data so
that each transaction only accesses one partition. For many
OLTP applications, partitioning the application manually
is straightforward. For example, the TPC-C OLTP benchmark can be partitioned by warehouse so an average of 89%
of the transactions access a single partition. There is
evidence that developers already do this to scale their applications, and academic research provides some
approaches for automatically selecting a good partitioning
key. However, unless the partitioning scheme is
100% eﬀective in making all transactions only access a single partition, then coordination across multiple partitions
for multi-partition transactions cause network stalls and executing a transaction to completion without stalls is not possible. In this paper, we focus on what the system should do
in this case.
3. EXECUTING TRANSACTIONS
In this section, we describe how our prototype executes
transactions. We begin by describing the components of our
system and the execution model. We then discuss how single
partition and multi-partition transactions are executed.

送TA礼物

1楼2011-11-04 22:23回复

3.1 System Components
The system is composed of three types of processes, shown
in Figure 1. First, the data is stored in partitions, a single
process that stores a portion of the data in memory, and executes stored procedures using a single thread. For each par-
tition, there is a primary process and k−1 backup processes,
which ensures that data survives k − 1 failures. Second, a
single process called the central coordinator is used to coordinate all distributed transactions. This ensures that distributed transactions are globally ordered, and is described
in Section 3.3. Finally, the client processes are end-user ap-
plications that issue transactions to the system in the form of
stored procedure invocations. When the client library connects to the database, it downloads the part of the system
catalog describing the available stored procedures, network
addresses for the partitions, and how data is distributed.
This permits the client library to direct transactions to the
appropriate processes.
Transactions are written as stored procedures, composed
of deterministic code interleaved database operations. The
client invokes transactions by sending a stored procedure
invocation to the system. The system distributes the work
across the partitions. In our prototype, the mapping of work
to partitions is done manually, but we are working on a query
planner to do this automatically
Each transaction is divided into fragments. A fragment is
a unit of work that can be executed at exactly one partition.
It can be some mixture of user code and database operations.
A single partition transaction, for example, is composed of
one fragment containing the entire transaction. A multipartition transaction is composed of multiple fragments with
data dependencies between them.
3.2 Single Partition Transactions
When a client determines that a request is a single partition transaction, it forwards it to the primary partition
responsible for the data. The primary uses a typical primary/backup replication protocol to ensure durability. In
the failure free case, the primary reads the request from the
network and sends a copy to the backups. While waiting
for acknowledgments, the primary executes the transaction.
Since it is a single partition transaction, it does not block.
When all acknowledgments from the backups are received,
the result of the transaction is sent to the client. This protocol ensures the transaction is durable, as long as at least
one replica survives a failure.
No concurrency control is needed to execute single partition transactions. In most cases, the system executes these
transactions without recording undo information, resulting
in very low overhead. This is possible because transactions
are annotated to indicate if a user abort may occur. For
transactions that have no possibility of a user abort, concurrency control schemes that guarantee that deadlock will
not occur (see below) do not keep an undo log. Otherwise,

2楼2011-11-04 22:23

莆田市荔城区酊启峤网络..

2025-05-03 14:51广告

立即查看

the system maintains an in-memory undo buﬀer that is discarded when the transaction commits.
3.3 Multi-Partition Transactions
In general, multi-partition transaction can have arbitrary
data dependencies between transaction fragments. For example, a transaction may need to read a value stored at
partition P1, in order to update a value at partition P2.
To ensure multi-partition transactions execute in a serializable order without deadlocks, we forward them through
the central coordinator, which assigns them a global order.
Although this is a straightforward approach, the central coordinator limits the rate of multi-partition transactions. To
handle more multi-partition transactions, multiple coordinators must be used. Previous work has investigated how
to globally order transactions with multiple coordinators,
for example by using loosely synchronized clocks. We
leave selecting the best alternative to future work, and only
evaluate a single coordinator system in this paper.
The central coordinator divides the transaction into fragments and sends them to the partitions. When responses are
received, the coordinator executes application code to determine how to continue the transaction, which may require
sending more fragments. Each partition executes fragments
for a given transaction sequentially.
Multi-partition transactions are executed using an undo
buﬀer, and use two-phase commit (2PC) to decide the outcome. This allows each partition of the database to fail independently. If the transaction causes one partition to crash or
the network splits during execution, other participants are
able to recover and continue processing transactions that
do not depend on the failed partition. Without undo information, the system would need to block until the failure is
repaired.
The coordinator piggybacks the 2PC “prepare” message
with the last fragment of a transaction. When the primary
receives the ﬁnal fragment, it sends all the fragments of the
transaction to the backups and waits for acknowledgments
before sending the ﬁnal results to the coordinator. This is
equivalent to forcing the participant’s 2PC vote to disk. Finally, when the coordinator has all the votes from the participants, it completes the transaction by sending a “commit”
message to the partitions and returning the ﬁnal result to
the application.
When executing multi-partition transactions, network stalls
can occur while waiting for data from other partitions. This
idle time can introduce a performance bottleneck, even if
multi-partition transactions only comprise a small fraction
of the workload. On our experimental systems, described
in Section 5, the minimum network round-trip time between two machines connected to the same gigabit Ethernet switch was measured using ping to be approximately 40
µs. The average CPU time for a TPC-C transaction in our
system is 26 µs. Thus, while waiting for a network acknowledgment, the partition could execute at least two singlepartition transactions. Some form of concurrency control is
needed to permit the engine to do useful work while otherwise idle. The challenge is to not reduce the eﬃciency of
simple single partition transactions. The next section describes two concurrency control schemes we have developed to address this issue.

3楼2011-11-04 22:23