# Splunk Smartstore Thaw Notes https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Restorearchiveddata#Thaw_a_4.2.2B_archive ## Thawing Frozen Data DO NOT thaw an archived (frozen) bucket into a SmartStore index! Create a separate, "classic" index that does not utilize SmartStore (no `remotePath`) and thaw the buckets into the `thawedPath` of that index. If you plan to thaw buckets frequently, you might want to create a set of non-SmartStore indexes that parallel the SmartStore indexes in name. For example, "nonS2_main". ## Check disk size How much data are you going to be thawing? It doesn't matter which indexer you copy the buckets to. Start with the first indexer and fill up the drive to an acceptable amount then start coping buckets to the next indexer. If you run out of acceptable space on all the indexers, use TF to create more indexers and copy the buckets to the new indexers. ## Create a new index - add index to CM repo and push it to the indexers. Ensure a thawedPath is specified. Name the index something similar to the smartstore index such as nonS2_index-name. Do not specify a remotePath in the indexer. The thawedPath can NOT have volume: in it. Files location: master-apps/moose_all_indexes/local/indexes.conf ``` #No Smartstore for thawing #/opt/splunkdata/hot/splunk_db [nons2_wineventlog] homePath = volume:normal_primary/$_index_name/db coldPath = volume:normal_primary/$_index_name/colddb thawedPath = $SPLUNK_DB/nons2_wineventlog/thaweddb tstatsHomePath = volume:normal_primary/$_index_name/datamodel_summary #override defaults to ensure the data doesn't go anywhere remotePath = frozenTimePeriodInSecs = 188697600 ``` ## Restore the buckets from Glacier to S3 RUN sc3cmd ON YOUR LAPTOP to restore the files from Glacier NOT the server ( maybe? Does it support Assume role (STS) auth yet?). The easiest way to restore S3 objects from Glacier is to use the S3cmd script. Run these commands AFTER you setup awscli. awscli grabs keys and s3cmd uses those keys ( i think? ) https://github.com/s3tools/s3cmd ``` wget https://github.com/s3tools/s3cmd/archive/refs/heads/master.zip unzip master.zip cd s3cmd-master ./s3cmd --configure ``` s3cmd configuration `./s3cmd --dump-config` ``` Default Region: us-gov-east-1 S3 Endpoint: s3.us-gov-east-1.amazonaws.com DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3-us-gov-east-1.amazonaws.com Encryption password: Path to GPG program: /bin/gpg Use HTTPS protocol: True HTTP Proxy server name: ( two spaces worked ) HTTP Proxy server port: 80 host_base = s3.us-gov-east-1.amazonaws.com host_bucket = %(bucket)s.s3-us-gov-east-1.amazonaws.com bucket_location = us-gov-east-1 ``` Don't test access! Just save the config. `./s3cmd ls` Unthaw just one bucket `./s3cmd restore --restore-priority=expedited --restore-days=3 --recursive s3://xdr-moose-test-splunk-frozen/app_vault/frozendb/db_1620941330_1619445054_9_D783E822-5127-4A28-B6B5-6F42F5ACEC99/` `~/s3cmd-master/s3cmd restore --restore-priority=expedited --restore-days=3 --recursive s3://xdr-moose-test-splunk-frozen/app_vault/frozendb/db_1620941733_1619458749_12_E9AE582F-FAA8-4D0C-974F-148229DEB8A1/` Unthaw one whole index only. This will take 12 seconds per bucket. `1000 buckets = 15*1000 seconds or 4 hours.` Use tmux to keep the session alive!!! notice this will not thaw rb_* buckets if you use the exclude `./s3cmd restore --restore-priority=expedited --restore-days=14 --recursive --exclude="frozendb/rb_*" s3://xdr-afs-prod-splunk-frozen/wineventlog/` ## Copy the buckets from S3 into the thawpath in the new index Ensure the thaweddb folder exists after pushing out the new bundle to the indexers, then copy the bucket into the dir. aws cli is probably not installed so see splunk ma-c19 offbording notes for creating a python3 venv. `aws s3 ls` ### <- This one worked AFTER the files were restored --> ~/awscli/bin/aws s3 cp s3://xdr-moose-test-splunk-frozen/app_vault/frozendb/db_1620941330_1619445054_9_D783E822-5127-4A28-B6B5-6F42F5ACEC99/ /opt/splunkdata/hot/splunk_db/nons2_app_vault/thaweddb/db_1620941330_1619445054_9_D783E822-5127-4A28-B6B5-6F42F5ACEC99/ --recursive --force ~/awscli/bin/aws s3 cp s3://xdr-moose-test-splunk-frozen/app_vault/frozendb/db_1620941733_1619458749_12_E9AE582F-FAA8-4D0C-974F-148229DEB8A1/ /opt/splunkdata/hot/splunk_db/nons2_app_vault/thaweddb/db_1620941733_1619458749_12_E9AE582F-FAA8-4D0C-974F-148229DEB8A1/ --recursive --force ### Setup for the zztop.sh script Once you can pull an individual bucket, use this to pull multiple buckets. Make list of ALL buckets in each index ( use this for Multiple indexes) Make list of indexes `aws s3 ls s3://mdr-afs-prod-splunk-frozen | awk '{ print $2 }' > foo1` `for i in $(cat foo1| egrep -v ^_); do aws s3 ls s3://mdr-afs-prod-splunk-frozen/${i}frozendb/ | egrep "db" | awk -v dir=$i '{ printf("s3://mdr-afs-prod-splunk-frozen/%sfrozendb/%s\n",dir,$2)}' ; done > bucketlist` Use this for ONE index `~/awscli/bin/aws s3 ls s3://xdr-afs-prod-splunk-frozen/wineventlog/frozendb/ | egrep "db_16(1|2|3)" | awk -v dir=wineventlog '{ printf("s3://xdr-afs-prod-splunk-frozen/%s/frozendb/%s\n",dir,$2)}' > bucketlist` break up list ( 10 indexers in this case ) `cat bucketlist | awk '{ x=NR%10 }{print >> "indexerlist"x}'` break up list ( 3 indexers in this case ) `cat bucketlist | awk '{ x=NR%3 }{print >> "indexerlist"x}'` Move the files to the salt master to distribute them Or just manually put them on different indexers. install the awscli on the indexers move the indexerlistX to each indexer copy zztop.sh to each indexer. zztop.sh ``` #!/bin/bash DEST=$( echo $1 | awk -F/ '{ print "/opt/splunkdata/hot/splunk_db/nons2_wineventlog/thaweddb/"$6 }' ) mkdir -p $DEST /root/awscli/bin/aws s3 cp $1 $DEST --recursive --force-glacier-transfer --no-progress ``` try out one line to ensure the DEST is correct. `egrep -h "*" indexerlist* | head -1 | awk -F/ '{ print "/opt/splunkdata/hot/splunk_db/nons2_wineventlog/thaweddb/"$6 }'` Try one splunk bucket `egrep -h "*" indexerlist* | head -1 | xargs -P 10 -n 1 ./zztop.sh` Go for it. use tmux to avoid session timeout. adjust -P # to increase threads. `egrep -h "*" indexerlist* | xargs -P 3 -n 1 ./zztop.sh` Change permissions to ensure folders are owned by splunk user. `chmod -R splunk: *` Not sure why but it named the buckets with "inflight-". fix_names.sh ``` #!/bin/bash prefix="inflight-" for i in $(ls) do if [[ $i == *"inflight-"* ]]; then echo "$i" k=${i#"$prefix"} echo "$k" mv $i $k fi done ``` ## Restart the Indexers Just do a rolling restart? ## Datamodel Acceleration Please note that restoring the data will not add it to the current datamodels. Once the data is restored, Splunk will start accelerating the data assuming it is configured to do so and the restored data includes data that is configured to be accelerated. ## Data removal Once the data is not longer needed, you can simply delete the data from the thawed directory.