DO NOT thaw an archived (frozen) bucket into a SmartStore index!
Create a separate, "classic" index that does not utilize SmartStore (no remotePath
) and thaw the buckets into the thawedPath
of that index. If you plan to thaw buckets frequently, you might want to create a set of non-SmartStore indexes that parallel the SmartStore indexes in name. For example, "nonS2_main".
How much data are you going to be thawing? It doesn't matter which indexer you copy the buckets to. Start with the first indexer and fill up the drive to an acceptable amount then start coping buckets to the next indexer. If you run out of acceptable space on all the indexers, use TF to create more indexers and copy the buckets to the new indexers.
add index to CM repo and push it to the indexers. Ensure a thawedPath is specified. Name the index something similar to the smartstore index such as nonS2_index-name. Do not specify a remotePath in the indexer. The thawedPath can NOT have volume: in it. Files location: master-apps/moose_all_indexes/local/indexes.conf
#No Smartstore for thawing
#/opt/splunkdata/hot/splunk_db
[nons2_wineventlog]
homePath = volume:normal_primary/$_index_name/db
coldPath = volume:normal_primary/$_index_name/colddb
thawedPath = $SPLUNK_DB/nons2_wineventlog/thaweddb
tstatsHomePath = volume:normal_primary/$_index_name/datamodel_summary
#override defaults to ensure the data doesn't go anywhere
remotePath =
frozenTimePeriodInSecs = 188697600
RUN sc3cmd ON YOUR LAPTOP to restore the files from Glacier NOT the server ( maybe? Does it support Assume role (STS) auth yet?). The easiest way to restore S3 objects from Glacier is to use the S3cmd script. Run these commands AFTER you setup awscli. awscli grabs keys and s3cmd uses those keys ( i think? ) https://github.com/s3tools/s3cmd
wget https://github.com/s3tools/s3cmd/archive/refs/heads/master.zip
unzip master.zip
cd s3cmd-master
./s3cmd --configure
s3cmd configuration
./s3cmd --dump-config
Default Region: us-gov-east-1
S3 Endpoint: s3.us-gov-east-1.amazonaws.com
DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3-us-gov-east-1.amazonaws.com
Encryption password:
Path to GPG program: /bin/gpg
Use HTTPS protocol: True
HTTP Proxy server name: <blank> ( two spaces worked )
HTTP Proxy server port: 80
host_base = s3.us-gov-east-1.amazonaws.com
host_bucket = %(bucket)s.s3-us-gov-east-1.amazonaws.com
bucket_location = us-gov-east-1
Don't test access! Just save the config.
./s3cmd ls
Unthaw just one bucket
./s3cmd restore --restore-priority=expedited --restore-days=3 --recursive s3://xdr-moose-test-splunk-frozen/app_vault/frozendb/db_1620941330_1619445054_9_D783E822-5127-4A28-B6B5-6F42F5ACEC99/
~/s3cmd-master/s3cmd restore --restore-priority=expedited --restore-days=3 --recursive s3://xdr-moose-test-splunk-frozen/app_vault/frozendb/db_1620941733_1619458749_12_E9AE582F-FAA8-4D0C-974F-148229DEB8A1/
Unthaw one whole index only. This will take 12 seconds per bucket. 1000 buckets = 15*1000 seconds or 4 hours.
Use tmux to keep the session alive!!! notice this will not thaw rb_* buckets if you use the exclude
./s3cmd restore --restore-priority=expedited --restore-days=14 --recursive --exclude="frozendb/rb_*" s3://xdr-afs-prod-splunk-frozen/wineventlog/
Ensure the thaweddb folder exists after pushing out the new bundle to the indexers, then copy the bucket into the dir. aws cli is probably not installed so see splunk ma-c19 offbording notes for creating a python3 venv.
aws s3 ls
~/awscli/bin/aws s3 cp s3://xdr-moose-test-splunk-frozen/app_vault/frozendb/db_1620941330_1619445054_9_D783E822-5127-4A28-B6B5-6F42F5ACEC99/ /opt/splunkdata/hot/splunk_db/nons2_app_vault/thaweddb/db_1620941330_1619445054_9_D783E822-5127-4A28-B6B5-6F42F5ACEC99/ --recursive --force
~/awscli/bin/aws s3 cp s3://xdr-moose-test-splunk-frozen/app_vault/frozendb/db_1620941733_1619458749_12_E9AE582F-FAA8-4D0C-974F-148229DEB8A1/ /opt/splunkdata/hot/splunk_db/nons2_app_vault/thaweddb/db_1620941733_1619458749_12_E9AE582F-FAA8-4D0C-974F-148229DEB8A1/ --recursive --force
Once you can pull an individual bucket, use this to pull multiple buckets.
Make list of ALL buckets in each index ( use this for Multiple indexes)
Make list of indexes
aws s3 ls s3://mdr-afs-prod-splunk-frozen | awk '{ print $2 }' > foo1
for i in $(cat foo1| egrep -v ^_); do aws s3 ls s3://mdr-afs-prod-splunk-frozen/${i}frozendb/ | egrep "db" | awk -v dir=$i '{ printf("s3://mdr-afs-prod-splunk-frozen/%sfrozendb/%s\n",dir,$2)}' ; done > bucketlist
Use this for ONE index
~/awscli/bin/aws s3 ls s3://xdr-afs-prod-splunk-frozen/wineventlog/frozendb/ | egrep "db_16(1|2|3)" | awk -v dir=wineventlog '{ printf("s3://xdr-afs-prod-splunk-frozen/%s/frozendb/%s\n",dir,$2)}' > bucketlist
break up list ( 10 indexers in this case )
cat bucketlist | awk '{ x=NR%10 }{print >> "indexerlist"x}'
break up list ( 3 indexers in this case )
cat bucketlist | awk '{ x=NR%3 }{print >> "indexerlist"x}'
Move the files to the salt master to distribute them Or just manually put them on different indexers. install the awscli on the indexers
move the indexerlistX to each indexer copy zztop.sh to each indexer.
zztop.sh
#!/bin/bash
DEST=$( echo $1 | awk -F/ '{ print "/opt/splunkdata/hot/splunk_db/nons2_wineventlog/thaweddb/"$6 }' )
mkdir -p $DEST
/root/awscli/bin/aws s3 cp $1 $DEST --recursive --force-glacier-transfer --no-progress
try out one line to ensure the DEST is correct.
egrep -h "*" indexerlist* | head -1 | awk -F/ '{ print "/opt/splunkdata/hot/splunk_db/nons2_wineventlog/thaweddb/"$6 }'
Try one splunk bucket
egrep -h "*" indexerlist* | head -1 | xargs -P 10 -n 1 ./zztop.sh
Go for it. use tmux to avoid session timeout. adjust -P # to increase threads.
egrep -h "*" indexerlist* | xargs -P 3 -n 1 ./zztop.sh
Change permissions to ensure folders are owned by splunk user.
chmod -R splunk: *
Not sure why but it named the buckets with "inflight-".
fix_names.sh
#!/bin/bash
prefix="inflight-"
for i in $(ls)
do
if [[ $i == *"inflight-"* ]]; then
echo "$i"
k=${i#"$prefix"}
echo "$k"
mv $i $k
fi
done
Just do a rolling restart?
Please note that restoring the data will not add it to the current datamodels. Once the data is restored, Splunk will start accelerating the data assuming it is configured to do so and the restored data includes data that is configured to be accelerated.
Once the data is not longer needed, you can simply delete the data from the thawed directory.