Splunk documentation: https://docs.splunk.com/Documentation/Splunk/latest/Indexer/AboutSmartStore
Items of note:
maxGlobalDataSizeMB
, maxGlobalRawDataSizeMB
, and frozenTimePeriodInSecs
control when to freeze data.
Commit 35d2254 enabled the creation of a SmartStore (S2) specific S3 bucket for every customer.
To add it to existing customers/slices, copy the 145-splunk-smartstore-s3/
directory from xdr-terraform-live/test/aws-us-gov/mdr-test-modelclient/
to the target customer's directory and update the referenced tag if necessary to v3.2.14 or higher, then run terragrunt-local apply
to create the customer's SmartStore S3 bucket.
SF must equal RF. This setting is usually found in the Cluster Manager's /etc/system/local/server.conf
maxDataSize
is set to "auto"This has been corrected in the msoc-skeleton-cm configuration for all future customers. See commit 052d212.
For existing customers, it may be necessary to remove a 2nd maxDataSize
entry in master-apps/all_indexes/local/indexes.conf
(maxDataSize = 5000
).
minFreeSpace = 20%
to the [diskUsage]
Stanza in server.conf
This can be placed in the CM's master-apps/_cluster/local/server.conf
or in an app of its own.
[diskUsage]
minFreeSpace = 20%
Change the value in path
to match the S3 bucket name created by Terraform.
[volume:smartstore]
storageType = remote
# Do we want a path or drop everything into '/'?
path = s3://xdr-CUSTOMER-ENVIRONMENT-splunk-smartstore/
remote.s3.endpoint = https://s3.us-gov-east-1.amazonaws.com
remote.s3.supports_versioning = true
remote.s3.encryption = sse-kms
remote.s3.kms.key_id = alias/SmartStore
remote.s3.kms.auth_region = us-gov-east-1
# SSL settings for S3 communications
remote.s3.sslVerifyServerCert = true
remote.s3.sslVersions = tls1.2
remote.s3.sslAltNameToCheck = s3.us-gov-east-1.amazonaws.com
# https://www.amazontrust.com/repository/SFSRootCAG2.pem
remote.s3.sslRootCAPath = $SPLUNK_HOME/etc/auth/SFSRootCAG2.pem
remote.s3.cipherSuite = ECDHE-ECDSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:AES128-SHA256:AES256-SHA256:AES256-SHA:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256
remote.s3.ecdhCurves = prime256v1, secp384r1, secp521r1
# SSL settings for KMS communication
remote.s3.kms.sslVerifyServerCert = true
remote.s3.kms.sslVersions = tls1.2
remote.s3.kms.sslAltNameToCheck = kms.us-gov-east-1.amazonaws.com
remote.s3.kms.sslRootCAPath = $SPLUNK_HOME/etc/auth/SFSRootCAG2.pem
remote.s3.kms.cipherSuite = ECDHE-ECDSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:AES128-SHA256:AES256-SHA256:AES256-SHA:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256
remote.s3.kms.ecdhCurves = prime256v1, secp384r1, secp521r1
NOTE: It may be possible to reduce the cipherSuite value to TLSv1.2+HIGH:@STRENGTH
rather than specifying each type. The default value is TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH
.
The same commit referenced above for the msoc-skeleton-cm repository added remotePath = volume:smartstore/$_index_name
to the master-apps/all_indexes/local/indexes.conf
file.
NOTE: Before restarting the index cluster the first time, log in to the Cluster Manager, ensure there are no pending fix-up tasks and remove all excess buckets.
SFSRootCAG2.pem
to $SPLUNK_HOME/etc/auth/
This CA is used by Splunk to validate the certificates for the S3 bucket as well as the KMS queries agains us-gov-east-1. See https://www.amazontrust.com/repository/
This file was added to the indexers' Salt state with commit 6d5dc54 and can be added to the indexers with salt customer-splunk-idx-* state.sls splunk.indexer
(it may already be present).
indexes.conf
and server.conf
Deployed by the Cluster ManagerFrom the Salt server, run salt customer-splunk-cm* state.sls splunk.master.apply_bundle_master test=true
to ensure the indexers will receive the SmartStore volume definition and updated server.conf
. If it looks correct, run without test=true
. SSH to the Cluster Manager, become the splunk user and run splunk show cluster-bundle-status
(you may need to authenticate as the minion user) to observe the rolling cluster restart and correct any errors found in the bundle validation stage.
SSH to one of the customer indexers and switch to the Splunk user. You should be able to run /usr/local/bin/aws s3 ls --region us-gov-east-1 s3://xdr-<customer>-<env>-splunk-smartstore/
. There will be no output if this is successful.
Additional tests:
# List all indices and buckets
splunk cmd splunkd rfs -- ls --starts-with volume:smartstore
# List all buckets in a specific index
splunk cmd splunkd rfs -- ls --starts-with volume:smartstore/<some-index>/
Set constrain_singlesite_buckets = false
in the Cluster Manager's server.conf
under [clustering]
and restart Splunk on the Cluster Manager. This can be applied via Salt with salt <customer>-splunk-cm* state.sls splunk.master.init test=true
then without test=true
after validating only the INI change and service restart (perhaps file permissions change for etc/passwd) are the only changes.
Check the Bucket Status panel and resolve any pending fixup tasks.
Add remotePath = volume:smartstore/$_index_name
to an index such as _introspection
in the Cluster Manager's copy of master-apps/all_indexes/local/indexes.conf
, set frozenTimePeriodInSecs = 0
and maxGlobalDataSizeMB = 0
, then apply the change via Salt.
index=_internal sourcetype=splunkd TERM(action=upload)
| rex field=cache_id "\w+\|(?<indice>[^~]+)"
| stats count(eval(status=="attempting")) AS Attempting count(eval(status=="succeeded")) AS Succeeded count(eval(status=="failed")) AS Failed BY indice
| addcoltotals labelfield=indice
The _introspection index should appear in the search results with values under "Attempting" and "Succeeded". If the value under "Failed" is greater than zero, check splunkd.log on one of the indexers to troubleshoot.
Additional Splunk Searches:
| rest /services/admin/cacheman/_metrics splunk_server=*-splunk-idx-*
| fields splunk_server migration.*
| rename migration.* AS *
| sort start_epoch
| eval Duration = end_epoch - start_epoch, Duration = tostring(Duration, "duration")
| convert timeformat="%F %T %Z" ctime(start_epoch) AS Start ctime(end_epoch) AS End
| eval Completed = round(current_job/total_jobs,4)*100, End = if(isnull(End), "N/A", End), status = case( status=="running", "Running", status=="finished", "Finished", true(), status )
| eventstats sum(eval(Completed/3)) AS overall
| eval overall = round(overall,2)
| fields splunk_server Start End Duration status total_jobs current_job Completed overall
| rename splunk_server AS "Splunk Indexer" status AS Status current_job AS "Current Job" total_jobs AS "Total Jobs"
| appendpipe [
| stats count BY overall
| eval "Current Job" = "Overall Completion"
| rename overall AS Completed
| fields Completed "Current Job"]
| fields - overall
| eval Completed = Completed . "%"
If Splunk restarts before the migration completes, the endpoint data may not be accurate. If that happens, run:
| rest /services/admin/cacheman splunk_server=*-splunk-idx-*
| search cm:bucket.stable=0
| stats count BY splunk_server
IMPORTANT: Do not forget to reconfigure the retention settings after the migration.
Add remotePath = volume:smartstore/$_index_name
under the [default]
stanza as well as under all other index definitions. This is a good time to update indexes.conf
entries where a stanza is relying on values from [default]
rather than having them defined per index.
For example:
[os]
homePath = volume:normal_primary/$_index_name/db
coldPath = volume:normal_primary/$_index_name/colddb
remotePath = volume:smartstore/$_index_name
thawedPath = $SPLUNK_DB/os/thaweddb
tstatsHomePath = volume:high_primary/$_index_name/datamodel_summary
coldToFrozenScript = "/usr/bin/python3" "/usr/local/bin/coldToFrozenS3.py"
frozenTimePeriodInSecs = 31557600
lastChanceIndex = lastchance
maxConcurrentOptimizes = 24
maxDataSize = auto
maxHotBuckets = 10
maxTotalDataSizeMB = 4294967295
quarantineFutureSecs = 172800
quarantinePastSecs = 604800
repFactor = auto
Once all the index definitions have remotePath
defined, use Salt to apply the bundle change to the indexers. Observe the progress of the bundle application from the Cluster Manager as mentioned above and use the Splunk search to observe data moving to S3.
DO NOT thaw an archived (frozen) bucket into a SmartStore index!
Create a separate, "classic" index that does not utilize SmartStore (no remotePath
) and thaw the buckets into the thawedPath
of that index.