In the previous post I have run PostgreSQL on AWS m6gd.2xlarge (ARM Graviton2 processor).
I didn't precise the compilation option and this post will give more details following this feedback:
https://twitter.com/N_B__N_B/status/1369180884608315398
First, the PostgreSQL ./configure has correctly detected ARM and compiled with the following flags: -march=armv8-a+crc
This is ARM v8. However, LSE (Large System Extensions) for atomic instructions were added later in ARM v8.1 and they can make a huge difference on PostgreSQL especially with spinlocks on on high CPU usage.
I followed the information in https://github.com/aws/aws-graviton-getting-started/blob/master/c-c++.md to check the binaries after compilation.
for i in $(find postgres/src/backend -name "*.o") ; do objdump -d "$i" | awk '/:$/{w=$2}/aarch64_(cas|casp|swp|ldadd|stadd|ldclr|stclr|ldeor|steor|ldset|stset|ldsmax|stsmax|ldsmin|stsmin|ldumax|stumax|ldumin|stumin)/{printf "%-27s %-20s %-30s %-60s\n","(LSE instructions)",$NF,w,f}' f="$i" ; done | sort | uniq -c | sort -rnk1,4
8 (LSE instructions) <__aarch64_swp4_acq> <StartupXLOG>: postgres/src/backend/access/transam/xlog.o
7 (LSE instructions) <__aarch64_swp4_acq> <BitmapHeapNext>: postgres/src/backend/executor/nodeBitmapHeapscan.o
6 (LSE instructions) <__aarch64_ldclr4_acq_rel> <LWLockDequeueSelf>: postgres/src/backend/storage/lmgr/lwlock.o
6 (LSE instructions) <__aarch64_cas8_acq_rel> <shm_mq_send_bytes>: postgres/src/backend/storage/ipc/shm_mq.o
5 (LSE instructions) <__aarch64_swp4_acq> <WalReceiverMain>: postgres/src/backend/replication/walreceiver.o
5 (LSE instructions) <__aarch64_cas8_acq_rel> <shm_mq_receive_bytes.isra.0>: postgres/src/backend/storage/ipc/shm_mq.o
4 (LSE instructions) <__aarch64_swp4_acq> <ProcessRepliesIfAny>: postgres/src/backend/replication/walsender.o
4 (LSE instructions) <__aarch64_swp4_acq> <hash_search_with_hash_value>: postgres/src/backend/utils/hash/dynahash.o
4 (LSE instructions) <__aarch64_swp4_acq> <copy_replication_slot>: postgres/src/backend/replication/slotfuncs.o
4 (LSE instructions) <__aarch64_ldadd4_acq_rel> <parallel_vacuum_index>: postgres/src/backend/access/heap/vacuumlazy.o
4 (LSE instructions) <__aarch64_cas4_acq_rel> <LWLockAcquire>: postgres/src/backend/storage/lmgr/lwlock.o
3 (LSE instructions) <__aarch64_swp4_acq> <xlog_redo>: postgres/src/backend/access/transam/xlog.o
3 (LSE instructions) <__aarch64_swp4_acq> <XLogInsertRecord>: postgres/src/backend/access/transam/xlog.o
3 (LSE instructions) <__aarch64_swp4_acq> <SaveSlotToPath>: postgres/src/backend/replication/slot.o
3 (LSE instructions) <__aarch64_swp4_acq> <RequestCheckpoint>: postgres/src/backend/postmaster/checkpointer.o
3 (LSE instructions) <__aarch64_swp4_acq> <LogicalRepSyncTableStart>: postgres/src/backend/replication/logical/tablesync.o
3 (LSE instructions) <__aarch64_swp4_acq> <LogicalConfirmReceivedLocation>: postgres/src/backend/replication/logical/logical.o
3 (LSE instructions) <__aarch64_swp4_acq> <InvalidateObsoleteReplicationSlots>: postgres/src/backend/replication/slot.o
3 (LSE instructions) <__aarch64_swp4_acq> <CreateInitDecodingContext>: postgres/src/backend/replication/logical/logical.o
3 (LSE instructions) <__aarch64_swp4_acq> <CreateCheckPoint>: postgres/src/backend/access/transam/xlog.o
3 (LSE instructions) <__aarch64_swp4_acq> <CheckpointerMain>: postgres/src/backend/postmaster/checkpointer.o
3 (LSE instructions) <__aarch64_ldclr4_acq_rel> <LWLockQueueSelf>: postgres/src/backend/storage/lmgr/lwlock.o
3 (LSE instructions) <__aarch64_ldadd4_acq_rel> <tbm_prepare_shared_iterate>: postgres/src/backend/nodes/tidbitmap.o
3 (LSE instructions) <__aarch64_ldadd4_acq_rel> <tbm_free_shared_area>: postgres/src/backend/nodes/tidbitmap.o
3 (LSE instructions) <__aarch64_cas8_acq_rel> <ProcessProcSignalBarrier>: postgres/src/backend/storage/ipc/procsignal.o
3 (LSE instructions) <__aarch64_cas8_acq_rel> <ExecParallelHashIncreaseNumBatches>: postgres/src/backend/executor/nodeHash.o
2 (LSE instructions) <__aarch64_swp4_acq> <XLogWrite>: postgres/src/backend/access/transam/xlog.o
2 (LSE instructions) <__aarch64_swp4_acq> <XLogSendPhysical>: postgres/src/backend/replication/walsender.o
2 (LSE instructions) <__aarch64_swp4_acq> <XLogBackgroundFlush>: postgres/src/backend/access/transam/xlog.o
2 (LSE instructions) <__aarch64_swp4_acq> <WalRcvStreaming>: postgres/src/backend/replication/walreceiverfuncs.o
2 (LSE instructions) <__aarch64_swp4_acq> <WalRcvRunning>: postgres/src/backend/replication/walreceiverfuncs.o
2 (LSE instructions) <__aarch64_swp4_acq> <WalRcvDie>: postgres/src/backend/replication/walreceiver.o
2 (LSE instructions) <__aarch64_swp4_acq> <TransactionIdLimitedForOldSnapshots>: postgres/src/backend/utils/time/snapmgr.o
2 (LSE instructions) <__aarch64_swp4_acq> <StrategyGetBuffer>: postgres/src/backend/storage/buffer/freelist.o
2 (LSE instructions) <__aarch64_swp4_acq> <shm_mq_wait_internal>: postgres/src/backend/storage/ipc/shm_mq.o
2 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotReserveWal>: postgres/src/backend/replication/slot.o
2 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotRelease>: postgres/src/backend/replication/slot.o
2 (LSE instructions) <__aarch64_swp4_acq> <ProcKill>: postgres/src/backend/storage/lmgr/proc.o
2 (LSE instructions) <__aarch64_swp4_acq> <process_syncing_tables>: postgres/src/backend/replication/logical/tablesync.o
2 (LSE instructions) <__aarch64_swp4_acq> <pg_get_replication_slots>: postgres/src/backend/replication/slotfuncs.o
2 (LSE instructions) <__aarch64_swp4_acq> <exec_replication_command>: postgres/src/backend/replication/walsender.o
2 (LSE instructions) <__aarch64_swp4_acq> <CreateRestartPoint>: postgres/src/backend/access/transam/xlog.o
2 (LSE instructions) <__aarch64_swp4_acq> <ConditionVariableBroadcast>: postgres/src/backend/storage/lmgr/condition_variable.o
2 (LSE instructions) <__aarch64_swp4_acq> <BarrierArriveAndWait>: postgres/src/backend/storage/ipc/barrier.o
2 (LSE instructions) <__aarch64_ldset4_acq_rel> <LWLockWaitListLock>: postgres/src/backend/storage/lmgr/lwlock.o
2 (LSE instructions) <__aarch64_ldclr4_acq_rel> <LWLockWaitForVar>: postgres/src/backend/storage/lmgr/lwlock.o
2 (LSE instructions) <__aarch64_ldclr4_acq_rel> <LWLockUpdateVar>: postgres/src/backend/storage/lmgr/lwlock.o
2 (LSE instructions) <__aarch64_ldadd4_acq_rel> <vacuum_delay_point>: postgres/src/backend/commands/vacuum.o
2 (LSE instructions) <__aarch64_ldadd4_acq_rel> <StrategyGetBuffer>: postgres/src/backend/storage/buffer/freelist.o
2 (LSE instructions) <__aarch64_ldadd4_acq_rel> <LWLockRelease>: postgres/src/backend/storage/lmgr/lwlock.o
2 (LSE instructions) <__aarch64_ldadd4_acq_rel> <lazy_parallel_vacuum_indexes>: postgres/src/backend/access/heap/vacuumlazy.o
2 (LSE instructions) <__aarch64_cas8_acq_rel> <WalReceiverMain>: postgres/src/backend/replication/walreceiver.o
2 (LSE instructions) <__aarch64_cas8_acq_rel> <WaitForProcSignalBarrier>: postgres/src/backend/storage/ipc/procsignal.o
2 (LSE instructions) <__aarch64_cas8_acq_rel> <shm_mq_receive>: postgres/src/backend/storage/ipc/shm_mq.o
2 (LSE instructions) <__aarch64_cas8_acq_rel> <ResolveRecoveryConflictWithLock>: postgres/src/backend/storage/ipc/standby.o
2 (LSE instructions) <__aarch64_cas8_acq_rel> <ProcSignalInit>: postgres/src/backend/storage/ipc/procsignal.o
2 (LSE instructions) <__aarch64_cas8_acq_rel> <ExecParallelHashTableInsert>: postgres/src/backend/executor/nodeHash.o
2 (LSE instructions) <__aarch64_cas8_acq_rel> <ExecParallelHashTableInsertCurrentBatch>: postgres/src/backend/executor/nodeHash.o
2 (LSE instructions) <__aarch64_cas8_acq_rel> <ExecParallelHashIncreaseNumBuckets>: postgres/src/backend/executor/nodeHash.o
2 (LSE instructions) <__aarch64_cas4_acq_rel> <TransactionIdSetTreeStatus>: postgres/src/backend/access/transam/clog.o
2 (LSE instructions) <__aarch64_cas4_acq_rel> <ProcArrayEndTransaction>: postgres/src/backend/storage/ipc/procarray.o
2 (LSE instructions) <__aarch64_cas4_acq_rel> <LWLockAcquireOrWait>: postgres/src/backend/storage/lmgr/lwlock.o
1 (LSE instructions) <__aarch64_swp4_acq> <XLogWalRcvFlush.part.4>: postgres/src/backend/replication/walreceiver.o
1 (LSE instructions) <__aarch64_swp4_acq> <XLogSetReplicationSlotMinimumLSN>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <XLogSetAsyncXactLSN>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <XLogSendLogical>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <XLogPageRead>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <XLogNeedsFlush>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <XLogGetLastRemovedSegno>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <XLogFlush>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <worker_freeze_result_tape>: postgres/src/backend/utils/sort/tuplesort.o
1 (LSE instructions) <__aarch64_swp4_acq> <WalSndWakeup>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <WalSndWaitStopping>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <WalSndSetState>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <WalSndRqstFileReload>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <WalSndKill>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <WalSndInitStopping>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <WalRcvForceReply>: postgres/src/backend/replication/walreceiver.o
1 (LSE instructions) <__aarch64_swp4_acq> <WaitXLogInsertionsToFinish>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <UpdateMinRecoveryPoint.part.10>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <tuplesort_performsort>: postgres/src/backend/utils/sort/tuplesort.o
1 (LSE instructions) <__aarch64_swp4_acq> <tuplesort_begin_common>: postgres/src/backend/utils/sort/tuplesort.o
1 (LSE instructions) <__aarch64_swp4_acq> <table_block_parallelscan_startblock_init>: postgres/src/backend/access/table/tableam.o
1 (LSE instructions) <__aarch64_swp4_acq> <SyncRepInitConfig>: postgres/src/backend/replication/syncrep.o
1 (LSE instructions) <__aarch64_swp4_acq> <SyncRepGetCandidateStandbys>: postgres/src/backend/replication/syncrep.o
1 (LSE instructions) <__aarch64_swp4_acq> <StrategySyncStart>: postgres/src/backend/storage/buffer/freelist.o
1 (LSE instructions) <__aarch64_swp4_acq> <StrategyNotifyBgWriter>: postgres/src/backend/storage/buffer/freelist.o
1 (LSE instructions) <__aarch64_swp4_acq> <StrategyFreeBuffer>: postgres/src/backend/storage/buffer/freelist.o
1 (LSE instructions) <__aarch64_swp4_acq> <SnapshotTooOldMagicForTest>: postgres/src/backend/utils/time/snapmgr.o
1 (LSE instructions) <__aarch64_swp4_acq> <s_lock>: postgres/src/backend/storage/lmgr/s_lock.o
1 (LSE instructions) <__aarch64_swp4_acq> <SIInsertDataEntries>: postgres/src/backend/storage/ipc/sinvaladt.o
1 (LSE instructions) <__aarch64_swp4_acq> <SIGetDataEntries>: postgres/src/backend/storage/ipc/sinvaladt.o
1 (LSE instructions) <__aarch64_swp4_acq> <ShutdownWalRcv>: postgres/src/backend/replication/walreceiverfuncs.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_toc_insert>: postgres/src/backend/storage/ipc/shm_toc.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_toc_freespace>: postgres/src/backend/storage/ipc/shm_toc.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_toc_allocate>: postgres/src/backend/storage/ipc/shm_toc.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_mq_set_sender>: postgres/src/backend/storage/ipc/shm_mq.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_mq_set_receiver>: postgres/src/backend/storage/ipc/shm_mq.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_mq_sendv>: postgres/src/backend/storage/ipc/shm_mq.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_mq_get_sender>: postgres/src/backend/storage/ipc/shm_mq.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_mq_get_receiver>: postgres/src/backend/storage/ipc/shm_mq.o
1 (LSE instructions) <__aarch64_swp4_acq> <shm_mq_detach_internal>: postgres/src/backend/storage/ipc/shm_mq.o
1 (LSE instructions) <__aarch64_swp4_acq> <ShmemAllocRaw>: postgres/src/backend/storage/ipc/shmem.o
1 (LSE instructions) <__aarch64_swp4_acq> <SharedFileSetOnDetach>: postgres/src/backend/storage/file/sharedfileset.o
1 (LSE instructions) <__aarch64_swp4_acq> <SharedFileSetAttach>: postgres/src/backend/storage/file/sharedfileset.o
1 (LSE instructions) <__aarch64_swp4_acq> <SetWalWriterSleeping>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <SetRecoveryPause>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <SetPromoteIsTriggered>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <SetOldSnapshotThresholdTimestamp>: postgres/src/backend/utils/time/snapmgr.o
1 (LSE instructions) <__aarch64_swp4_acq> <RequestXLogStreaming>: postgres/src/backend/replication/walreceiverfuncs.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotsDropDBSlots>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotsCountDBSlots>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotsComputeRequiredXmin>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotsComputeRequiredLSN>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotsComputeLogicalRestartLSN>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotPersist>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotMarkDirty>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotDropPtr>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotCreate>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotCleanup>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReplicationSlotAcquireInternal>: postgres/src/backend/replication/slot.o
1 (LSE instructions) <__aarch64_swp4_acq> <RemoveOldXlogFiles>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <RemoveLocalLock>: postgres/src/backend/storage/lmgr/lock.o
1 (LSE instructions) <__aarch64_swp4_acq> <RecoveryRestartPoint>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <RecoveryIsPaused>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <ReadRecord>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <PublishStartupProcessInformation>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_swp4_acq> <PromoteIsTriggered>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <ProcSendSignal>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_swp4_acq> <ProcessWalSndrMessage>: postgres/src/backend/replication/walreceiver.o
1 (LSE instructions) <__aarch64_swp4_acq> <PhysicalReplicationSlotNewXmin>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <pg_stat_get_wal_senders>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <pg_stat_get_wal_receiver>: postgres/src/backend/replication/walreceiver.o
1 (LSE instructions) <__aarch64_swp4_acq> <pg_replication_slot_advance>: postgres/src/backend/replication/slotfuncs.o
1 (LSE instructions) <__aarch64_swp4_acq> <ParallelWorkerReportLastRecEnd>: postgres/src/backend/access/transam/parallel.o
1 (LSE instructions) <__aarch64_swp4_acq> <MaintainOldSnapshotTimeMapping>: postgres/src/backend/utils/time/snapmgr.o
1 (LSE instructions) <__aarch64_swp4_acq> <LWLockNewTrancheId>: postgres/src/backend/storage/lmgr/lwlock.o
1 (LSE instructions) <__aarch64_swp4_acq> <LogicalIncreaseXminForSlot>: postgres/src/backend/replication/logical/logical.o
1 (LSE instructions) <__aarch64_swp4_acq> <LogicalIncreaseRestartDecodingForSlot>: postgres/src/backend/replication/logical/logical.o
1 (LSE instructions) <__aarch64_swp4_acq> <lock_twophase_recover>: postgres/src/backend/storage/lmgr/lock.o
1 (LSE instructions) <__aarch64_swp4_acq> <LockRefindAndRelease>: postgres/src/backend/storage/lmgr/lock.o
1 (LSE instructions) <__aarch64_swp4_acq> <LockAcquireExtended>: postgres/src/backend/storage/lmgr/lock.o
1 (LSE instructions) <__aarch64_swp4_acq> <KnownAssignedXidsSearch>: postgres/src/backend/storage/ipc/procarray.o
1 (LSE instructions) <__aarch64_swp4_acq> <KnownAssignedXidsGetAndSetXmin>: postgres/src/backend/storage/ipc/procarray.o
1 (LSE instructions) <__aarch64_swp4_acq> <KnownAssignedXidsAdd>: postgres/src/backend/storage/ipc/procarray.o
1 (LSE instructions) <__aarch64_swp4_acq> <KeepLogSeg>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <InitWalSender>: postgres/src/backend/replication/walsender.o
1 (LSE instructions) <__aarch64_swp4_acq> <InitProcess>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_swp4_acq> <InitAuxiliaryProcess>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_swp4_acq> <HotStandbyActive>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <HaveNFreeProcs>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetXLogWriteRecPtr>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetXLogReplayRecPtr>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetXLogInsertRecPtr>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetWalRcvFlushRecPtr>: postgres/src/backend/replication/walreceiverfuncs.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetSnapshotCurrentTimestamp>: postgres/src/backend/utils/time/snapmgr.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetReplicationTransferLatency>: postgres/src/backend/replication/walreceiverfuncs.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetReplicationApplyDelay>: postgres/src/backend/replication/walreceiverfuncs.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetRedoRecPtr>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetRecoveryState>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetOldSnapshotThresholdTimestamp>: postgres/src/backend/utils/time/snapmgr.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetLatestXTime>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetInsertRecPtr>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetFlushRecPtr>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetFakeLSNForUnloggedRel>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <GetCurrentChunkReplayStartTime>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <FirstCallSinceLastCheckpoint>: postgres/src/backend/postmaster/checkpointer.o
1 (LSE instructions) <__aarch64_swp4_acq> <element_alloc>: postgres/src/backend/utils/hash/dynahash.o
1 (LSE instructions) <__aarch64_swp4_acq> <do_pg_stop_backup>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <do_pg_start_backup>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <DecodingContextFindStartpoint>: postgres/src/backend/replication/logical/logical.o
1 (LSE instructions) <__aarch64_swp4_acq> <ConditionVariableTimedSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
1 (LSE instructions) <__aarch64_swp4_acq> <ConditionVariableSignal>: postgres/src/backend/storage/lmgr/condition_variable.o
1 (LSE instructions) <__aarch64_swp4_acq> <ConditionVariablePrepareToSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
1 (LSE instructions) <__aarch64_swp4_acq> <ConditionVariableCancelSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
1 (LSE instructions) <__aarch64_swp4_acq> <ComputeXidHorizons>: postgres/src/backend/storage/ipc/procarray.o
1 (LSE instructions) <__aarch64_swp4_acq> <CheckXLogRemoved>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <CheckRecoveryConsistency.part.11>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <_bt_parallel_seize>: postgres/src/backend/access/nbtree/nbtree.o
1 (LSE instructions) <__aarch64_swp4_acq> <_bt_parallel_scan_and_sort>: postgres/src/backend/access/nbtree/nbtsort.o
1 (LSE instructions) <__aarch64_swp4_acq> <btparallelrescan>: postgres/src/backend/access/nbtree/nbtree.o
1 (LSE instructions) <__aarch64_swp4_acq> <_bt_parallel_release>: postgres/src/backend/access/nbtree/nbtree.o
1 (LSE instructions) <__aarch64_swp4_acq> <_bt_parallel_done>: postgres/src/backend/access/nbtree/nbtree.o
1 (LSE instructions) <__aarch64_swp4_acq> <_bt_parallel_advance_array_keys>: postgres/src/backend/access/nbtree/nbtree.o
1 (LSE instructions) <__aarch64_swp4_acq> <btbuild>: postgres/src/backend/access/nbtree/nbtsort.o
1 (LSE instructions) <__aarch64_swp4_acq> <BarrierParticipants>: postgres/src/backend/storage/ipc/barrier.o
1 (LSE instructions) <__aarch64_swp4_acq> <BarrierDetach>: postgres/src/backend/storage/ipc/barrier.o
1 (LSE instructions) <__aarch64_swp4_acq> <BarrierAttach>: postgres/src/backend/storage/ipc/barrier.o
1 (LSE instructions) <__aarch64_swp4_acq> <BarrierArriveAndDetach>: postgres/src/backend/storage/ipc/barrier.o
1 (LSE instructions) <__aarch64_swp4_acq> <BarrierArriveAndDetachExceptLast>: postgres/src/backend/storage/ipc/barrier.o
1 (LSE instructions) <__aarch64_swp4_acq> <AuxiliaryProcKill>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_swp4_acq> <AdvanceXLInsertBuffer>: postgres/src/backend/access/transam/xlog.o
1 (LSE instructions) <__aarch64_swp4_acq> <AbortStrongLockAcquire>: postgres/src/backend/storage/lmgr/lock.o
1 (LSE instructions) <__aarch64_ldset4_acq_rel> <ProcessProcSignalBarrier>: postgres/src/backend/storage/ipc/procsignal.o
1 (LSE instructions) <__aarch64_ldset4_acq_rel> <LWLockWaitForVar>: postgres/src/backend/storage/lmgr/lwlock.o
1 (LSE instructions) <__aarch64_ldset4_acq_rel> <LWLockQueueSelf>: postgres/src/backend/storage/lmgr/lwlock.o
1 (LSE instructions) <__aarch64_ldset4_acq_rel> <LWLockDequeueSelf>: postgres/src/backend/storage/lmgr/lwlock.o
1 (LSE instructions) <__aarch64_ldset4_acq_rel> <LWLockAcquire>: postgres/src/backend/storage/lmgr/lwlock.o
1 (LSE instructions) <__aarch64_ldset4_acq_rel> <LockBufHdr>: postgres/src/backend/storage/buffer/bufmgr.o
1 (LSE instructions) <__aarch64_ldset4_acq_rel> <EmitProcSignalBarrier>: postgres/src/backend/storage/ipc/procsignal.o
1 (LSE instructions) <__aarch64_ldclr4_acq_rel> <LWLockReleaseClearVar>: postgres/src/backend/storage/lmgr/lwlock.o
1 (LSE instructions) <__aarch64_ldadd8_acq_rel> <table_block_parallelscan_nextpage>: postgres/src/backend/access/table/tableam.o
1 (LSE instructions) <__aarch64_ldadd8_acq_rel> <EmitProcSignalBarrier>: postgres/src/backend/storage/ipc/procsignal.o
1 (LSE instructions) <__aarch64_ldadd4_acq_rel> <find_or_make_matching_shared_tupledesc>: postgres/src/backend/utils/cache/typcache.o
1 (LSE instructions) <__aarch64_ldadd4_acq_rel> <ExecParallelHashJoin>: postgres/src/backend/executor/nodeHashjoin.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <table_block_parallelscan_reinitialize>: postgres/src/backend/access/table/tableam.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <ProcWakeup>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <ProcSleep>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <pg_stat_get_wal_receiver>: postgres/src/backend/replication/walreceiver.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <InitProcess>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <InitAuxiliaryProcess>: postgres/src/backend/storage/lmgr/proc.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <GetWalRcvWriteRecPtr>: postgres/src/backend/replication/walreceiverfuncs.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <GetLockStatusData>: postgres/src/backend/storage/lmgr/lock.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <ExecParallelScanHashBucket>: postgres/src/backend/executor/nodeHash.o
1 (LSE instructions) <__aarch64_cas8_acq_rel> <CleanupProcSignalState>: postgres/src/backend/storage/ipc/procsignal.o
1 (LSE instructions) <__aarch64_cas4_acq_rel> <UnpinBuffer.constprop.11>: postgres/src/backend/storage/buffer/bufmgr.o
1 (LSE instructions) <__aarch64_cas4_acq_rel> <StrategySyncStart>: postgres/src/backend/storage/buffer/freelist.o
1 (LSE instructions) <__aarch64_cas4_acq_rel> <StrategyGetBuffer>: postgres/src/backend/storage/buffer/freelist.o
1 (LSE instructions) <__aarch64_cas4_acq_rel> <ProcessProcSignalBarrier>: postgres/src/backend/storage/ipc/procsignal.o
1 (LSE instructions) <__aarch64_cas4_acq_rel> <PinBuffer>: postgres/src/backend/storage/buffer/bufmgr.o
1 (LSE instructions) <__aarch64_cas4_acq_rel> <MarkBufferDirty>: postgres/src/backend/storage/buffer/bufmgr.o
1 (LSE instructions) <__aarch64_cas4_acq_rel> <LWLockRelease>: postgres/src/backend/storage/lmgr/lwlock.o
1 (LSE instructions) <__aarch64_cas4_acq_rel> <LWLockConditionalAcquire>: postgres/src/backend/storage/lmgr/lwlock.o
So, this confirms that it was compiled with -march=armv8-a and outline -moutline-atomics (which is the default in GCC >= 10 and also in the GCC 7 compiled in Amazon Linux 2). LSE (Large-System Extensions) are there, and we can see where the atomic instructions are used: WAL and buffer lightweight locks that protect access to shared memory.
for i in /usr/local/pgsql/bin/postgres $(find postgres/src/backend -name "*.o") ; do objdump -d "$i" | awk '/:$/{w=$2}/aarch64_(cas|casp|swp|ldadd|stadd|ldclr|stclr|ldeor|steor|ldset|stset|ldsmax|stsmax|ldsmin|stsmin|ldumax|stumax|ldumin|stumin)/{printf "%-27s %-40s %-40s %-60s\n","(LSE instructions)",$NF,w,f}/\t(ldxr|ldaxr|stxr|stlxr)\t/{printf "%-27s %-40s %-40s %-60s\n","(load and store exclusives)",$3,w,f}' f="$i" ; done | sort | uniq -c | sort -rn
1 (load and store exclusives) stxr <__aarch64_swp4_acq>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) stlxr <__aarch64_ldset4_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) stlxr <__aarch64_ldclr4_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) stlxr <__aarch64_ldadd8_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) stlxr <__aarch64_ldadd4_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) stlxr <__aarch64_cas8_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) stlxr <__aarch64_cas4_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) ldaxr <__aarch64_swp4_acq>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) ldaxr <__aarch64_ldset4_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) ldaxr <__aarch64_ldclr4_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) ldaxr <__aarch64_ldadd8_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) ldaxr <__aarch64_ldadd4_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) ldaxr <__aarch64_cas8_acq_rel>: /usr/local/pgsql/bin/postgres
1 (load and store exclusives) ldaxr <__aarch64_cas4_acq_rel>: /usr/local/pgsql/bin/postgres
This confirms that the PostgreSQL binary also contains load and store exclusives so that the binary can run on Graviton and Graviton2.
[ec2-user@ip-172-31-11-116 ~]$ nm /usr/local/pgsql/bin/postgres | grep -E "aarch64(_have_lse_atomics)?"
00000000008fb460 t __aarch64_cas4_acq_rel
00000000008fb490 t __aarch64_cas8_acq_rel
0000000000bbe640 b __aarch64_have_lse_atomics
00000000008fb4f0 t __aarch64_ldadd4_acq_rel
00000000008fb580 t __aarch64_ldadd8_acq_rel
00000000008fb520 t __aarch64_ldclr4_acq_rel
00000000008fb550 t __aarch64_ldset4_acq_rel
00000000008fb4c0 t __aarch64_swp4_acq
This is the run-time detection. As it was compiled for ARM v8, with atomics outlined, the same binary can run on v8 or >=v8.1
[ec2-user@ip-172-31-11-116 ~]$ gcc --version
gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This is GCC 7, but on Amazon Linux 2 it has been patched to enable -moutline-atomics by default.
Install latest version of GCC (version 11 experimental)
Here is how I compiled the latest GCC available:
gcc --version
sudo yum -y install bzip2 git gcc gcc-c++ gmp-devel mpfr-devel libmpc-devel make flex bison
git clone https://github.com/gcc-mirror/gcc.git
cd gcc
make distclean
./configure --enable-languages=c,c++
make
sudo make install
This basically get the latest GCC fron source, compiles and installs it (please remember this is a lab - use stable versions elswhere)
[ec2-user@ip-172-31-38-254 ~]$ gcc --version
gcc (GCC) 11.0.1 20210309 (experimental)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Here we are: gcc 11.0.1 20210309 (experimental)
PGIO LIOPS
I'm running the same PGIO as in previous post
Date: Wed Mar 10 14:39:38 UTC 2021
Database connect string: "pgio".
Shared buffers: 8500MB.
Testing 4 schemas with 1 thread(s) accessing 1024M (131072 blocks) of each schema.
Running iostat, vmstat and mpstat on current host--in background.
Launching sessions. 4 schema(s) will be accessed by 1 thread(s) each.
pg_stat_database stats:
datname| blks_hit| blks_read|tup_returned|tup_fetched|tup_updated
BEFORE: pgio | 38262338086 | 562443 | 37644815538 | 37635763756 | 24
AFTER: pgio | 49691750429 | 562449 | 48890461241 | 48878858651 | 49
DBNAME: pgio. 4 schemas, 1 threads(each). Run time: 3600 seconds. RIOPS >793709<
This is a little higher than what I had: 793709 LIOPS / CPU where I had 780651 with GCC 7 but that's still lower than the 896280 I had on x86.
Of course, there can be more optimisations as mentioned in https://github.com/aws/aws-graviton-getting-started/blob/master/c-c++.md
I'll recompile with the recommended flags
(
cd postgres
CFLAGS="-march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1 -fsigned-char" ./configure
make clean
make
make install
)
I didn't make any difference in the PGIO run. Of course, this may change with a read-write workload (more spinlocks) with checksum.
Note that I compiled with the default (empty) CFLAGS and then gcc was called with -march=armv8-a+crc (and -moutline-atomics is the default) so I'm in the same situation with run-time detection. Because the GCC >=10 behaviour has been backorted by Amazon to the GCC 7 in Amazon Linux 2. This was not clear for me initially (I got this clarified here).
By the way, Aurora on Graviton2 is still compiled with GCC 7.4
Update 15-MAY-2021: I have rephrased a few things here which were not clear (even for myself) but I'll write more on PostgreSQL on ARM, and on benchmarks in general. http://blog.pachot.net should send to the right place (or @FranckPachot twitter of course)