AWS NVME Notes.md 5.4 KB

AWS NVME Notes

In the nitro-based instances (m5, i3en, etc), AWS presents all drives as NVMe volumes. This provides better I/O performance, but can make life complicated when mapping drives from the terraform or web console to the device the machine actually sees.

Determining EBS Mapping

The EBS mapping is stored in the NVMe metadata. You can grab it by running:

nvme id-ctrl --raw-binary /dev/nvmXXX | cut -c 3073-3104 | sed 's/ //g'

Useful Scripts

/usr/local/bin/ebs-nvme-mapping:

#!/bin/bash
#
# Creates symbolic links from the nvme drives to /dev/sdX or /dev/xvdX.
# This may not be compatible with instances that have local storage, such
# as i3's
PATH="${PATH}:/usr/sbin"

for blkdev in $( nvme list | awk '/^\/dev/ { print $1 }' ) ; do
	mapping=$(nvme id-ctrl --raw-binary "${blkdev}" 2>/dev/null | cut -c3073-3104 | sed 's/ //g')
	if [[ "${mapping}" == xvd* ]]; then
		( test -b "${blkdev}" && test -L "/dev/${mapping}" ) || ln -s "${blkdev}" "/dev/${mapping}"
		for partition in $( ls -1 ${blkdev}p* 2> /dev/null ) ; do
			ln -s ${partition} /dev/${mapping}${partition/${blkdev}p/}
		done
	fi
done

/root/cf/define_logical_volume.sh:

#!/bin/bash
#
# Provides a simple way to create and format logical volumes for
# nitro-based AWS instances based off their configured mount point.
# NOTE: It does not create the fstab entry
#
# Syntax: define_logical_volume.sh <LABEL> <VOLGROUP> <LOGICALVOL> <DEVICE>
# Sample: define_logical_volume.sh SPLUNKFROZEN vg_frozen lv_frozen xvdj
LABEL=$1
VOLGRP=$2
LOGVOL=$3
DEVICE=$4
# Iterate over all the nvme devices, looking for those in /dev
for blkdev in $( nvme list | awk '/^\/dev/ {print $1 }' ); do
	# For each device grab the desired device name from the vendor data
	mapping=$(nvme id-ctrl --raw-binary "${blkdev}" 2>/dev/null | cut -c3073-3104 | sed 's/ //g')
	# If the desired device name is one of those currently requested
	if `echo $* | egrep -q "\<${mapping}\>"`; then
		# Repoint our device variable to the real device
		DEVICE="$blkdev"
		# Then partition it for use
		parted $DEVICE --script -- mklabel gpt
		parted -a optimal $DEVICE mkpart primary 0% 100%
		partprobe
		sleep 1
	fi
done
vgcreate $VOLGRP ${DEVICE}p1
lvcreate -l 100%FREE -n ${LOGVOL} ${VOLGRP}
mkfs.ext4 -L $LABEL /dev/mapper/${VOLGRP}-${LOGVOL}

/root/cf/define_swap_volume.sh:

#!/bin/bash
# Create a simple way to prepare and initialize a swap
# volume on AWS nitro based instances.
# NOTE: Unlike create_logical_volume, this script DOES
#       create an fstab entry.
#
# Syntax: define_swap_volume.sh <LABEL> <DEVICE>
# Sample: define_swap_volume.sh SWAP xvdb
LABEL=$1
DEVICE=$2
# Iterate over all the nvme devices, looking for those in /dev
for blkdev in $( nvme list | awk '/^\/dev/ {print $1 }' ); do
	# For each device grab the desired device name from the vendor data
	mapping=$(nvme id-ctrl --raw-binary "${blkdev}" 2>/dev/null | cut -c3073-3104 | sed 's/ //g')
	# If the desired device name is one of those currently requested
	if `echo $* | egrep -q "\<${mapping}\>"`; then
		# Repoint our device variable to the real device
		DEVICE="$blkdev"
		# Then partition it for use
		parted $DEVICE --script -- mklabel gpt
		parted -a optimal $DEVICE mkpart primary 0% 100%
		partprobe
		sleep 1
		mkswap -L $LABEL $DEVICE
		swapon $DEVICE
		echo "LABEL=$LABEL         swap                swap   defaults,nofail       0       2" >> /etc/fstab
	fi
done

/usr/local/sbin/initialize_nvme_storage.sh:

#!/bin/bash
#
# This needs to be called every boot, before Splunk starts. It
# initializes local storage as RAID-0.
#
# NOTE: This determines which NVMe drives are local based on
#       whether they are 2.5TB or 7.5TB! THIS IS NOT A GOOD
#       WAY, but works in a pinch. If you create 2.5TB EBS
#       volumes, you're in for some trouble.
if [ ! -b /dev/md0 ]; then
	# We are fresh or on new hardware. Recreate the RAID.
	rm -f /etc/mdadm.conf 2> /dev/null
	DEVICES=$( nvme list | egrep "[72].50  TB" | awk '{print $1}' )
	NUM=$( nvme list | egrep "[72].50  TB" | wc -l )
	mdadm --create --force --verbose /dev/md0 --level=0 --name=SMARTSTORE_CACHE --raid-devices=${NUM} ${DEVICES}
	mkfs -t xfs /dev/md0
	mkdir -p /opt/splunk/var/lib/splunk 2> /dev/null
	chown splunk:splunk /opt/splunk 2> /dev/null
	mdadm --verbose --detail --scan | tee -a /etc/mdadm.conf
fi
# Alternatively, could be mounted to /opt/splunk/var
mount /dev/md0 /opt/splunk/var/lib/splunk

/etc/systemd/system/ephemeral-init.service:

# Configures the splunk initialization script to run on
# boot.
# Because splunk is started by an init.d script, we
# cannot set dependencies in here, and instead must
# also modify the splunk init script.
[Unit]
#DefaultDependencies=no
#After=sysinit.target local-fs.target
#Before=base.target
RequiresMountsFor=/opt/splunk

[Service]
ExecStart=/usr/local/sbin/initialize_nvme_storage.sh

[Install]
WantedBy=default.target

/etc/init.d/splunkd

The splunk init script needs to be modified to wait for the instance storage RAID array to become active. To do so, modify the splunk_start() function in the script to begin with:

# WaitForMount BEGIN
  echo Waiting for mount point...
  count=0
  while ! mountpoint -q /opt/splunk/var/lib/splunk
  do
    echo "Mount point not ready. Sleep 1 second..."
    sleep 1
    count=`expr $count + 1`
    if test $count -eq 90
    then
      echo "timed out!"
      exit 1
    fi
  done
# WaitForMount END
  echo Starting Splunk...