5/28/2020
These are the notes for the migration to the indexer ASG.
module.moose_cluster.module.indexer_cluster.module.indexer2.aws_launch_configuration.splunk_indexer
module.moose_cluster.module.indexer_cluster.module.indexer2.aws_autoscaling_group.splunk_indexer_asg
terraform destroy -target=module.moose_cluster.module.indexer_cluster.module.indexer1.aws_launch_configuration.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer1.aws_autoscaling_group.splunk_indexer_asg -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_launch_configuration.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_autoscaling_group.splunk_indexer_asg -target=module.moose_cluster.module.indexer_cluster.module.indexer0.aws_launch_configuration.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer0.aws_autoscaling_group.splunk_indexer_asg
terraform destroy -target=module.moose_cluster.module.indexer_cluster.module.indexer1.aws_launch_template.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer1.aws_autoscaling_group.splunk_indexer_asg -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_launch_template.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_autoscaling_group.splunk_indexer_asg -target=module.moose_cluster.module.indexer_cluster.module.indexer0.aws_launch_template.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer0.aws_autoscaling_group.splunk_indexer_asg
Current moose subnet: subnet-07312c554f
(main-infrastructure-public-us-east-1c)
ASG subnet: subnet-0b1e9d82bc
(main-infrastructure-public-us-east-1a)
resource "aws_launch_configuration" "splunk_indexer" {
name = "${var.launch_conf_name}"
instance_type = "${var.idx_instance_type}"
image_id = "${var.ami}"
user_data = "${var.user_data}"
security_groups = ["${var.indexer_security_group_ids}"]
associate_public_ip_address = false
key_name = "${var.key_name}"
iam_instance_profile = "${var.iam_instance_profile}"
root_block_device = "${var.root_block_device}"
ebs_block_device = "${local.ebs_block_device}"
ebs_optimized = true
ephemeral_block_device = [
{
device_name = "xvdaa"
virtual_name = "ephemeral0"
},
{
device_name = "xvdab"
virtual_name = "ephemeral1"
},
{
device_name = "xvdac"
virtual_name = "ephemeral2"
},
{
device_name = "xvdad"
virtual_name = "ephemeral3"
},
{
device_name = "xvdae"
virtual_name = "ephemeral4"
},
{
device_name = "xvdaf"
virtual_name = "ephemeral5"
},
{
device_name = "xvdag"
virtual_name = "ephemeral6"
},
{
device_name = "xvdah"
virtual_name = "ephemeral7"
},
{
device_name = "xvdai"
virtual_name = "ephemeral8"
},
{
device_name = "xvdaj"
virtual_name = "ephemeral9"
},
{
device_name = "xvdak"
virtual_name = "ephemeral10"
},
{
device_name = "xvdal"
virtual_name = "ephemeral11"
},
{
device_name = "xvdam"
virtual_name = "ephemeral12"
},
{
device_name = "xvdan"
virtual_name = "ephemeral13"
},
{
device_name = "xvdao"
virtual_name = "ephemeral14"
},
{
device_name = "xvdap"
virtual_name = "ephemeral15"
},
{
device_name = "xvdaq"
virtual_name = "ephemeral16"
},
{
device_name = "xvdar"
virtual_name = "ephemeral17"
},
{
device_name = "xvdas"
virtual_name = "ephemeral18"
},
{
device_name = "xvdat"
virtual_name = "ephemeral19"
},
{
device_name = "xvdau"
virtual_name = "ephemeral20"
},
{
device_name = "xvdav"
virtual_name = "ephemeral21"
},
{
device_name = "xvdaw"
virtual_name = "ephemeral22"
},
{
device_name = "xvdax"
virtual_name = "ephemeral23"
},
]
lifecycle {
create_before_destroy = true
}
}
05/07/2020
ERROR:
module.moose_cluster.module.indexer_cluster.module.indexer0.aws_autoscaling_group.splunk_indexer_asg: 1 error(s) occurred:
aws_autoscaling_group.splunk_indexer_asg: "moose-splunk-asg-0": Waiting up to 10m0s: Need at least 1 healthy instances in ASG, have 0. Most recent activity: {
ActivityId: "71d5c796
-f6b8-7b06-600c-167c09da9b
",
AutoScalingGroupName: "moose-splunk-asg-0",
Cause: "At 2020-05-05T16:49:03Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.",
Description: "Launching a new EC2 instance. Status Reason: The requested configuration is currently not supported. Please check the documentation for supported configurations. Launching EC2 instance failed.",
Details: "{\"Subnet ID\":\"subnet-0b1e9d82bc
\",\"Availability Zone\":\"us-east-1a\"}",
EndTime: 2020-05-05 16:49:05 +0000 UTC,
Progress: 100,
StartTime: 2020-05-05 16:49:05.566 +0000 UTC,
StatusCode: "Failed",
StatusMessage: "The requested configuration is currently not supported. Please check the documentation for supported configurations. Launching EC2 instance failed."
}
FIX: ebs optimize needs to be set to false for t2.small instance size.
05/06/2020
Run the salt high state twice
first time it gets 'stuck' when run with salt-call in the cloud-init??. kill it with saltutil.kill_job 20200528224000719269
RHEL subscription failing (Error: Must specify an activation key) Pillar must be bad! salt-call state.sls os_modifications.rhel_registration
splunk install failing salt-call state.sls splunk.new_install
salt moose-splunk-indexer-i* cmd.run 'systemctl restart splunkuf'
05/08/2020
The internal UFs are pointing to moose-splunk-indexer-1.msoc.defpoint.local:9998, moose-splunk-indexer-2.msoc.defpoint.local:9998, moose-splunk-indexer-3.msoc.defpoint.local:9998. This is not going to work for a ASG. Switched to IDXC Discovery.
Collectd is pointing to moose-splunk-indexers DNS > moose-splunk-indexers.msoc.defpoint.local > internal IPs :sad-face:
Change to internal ELB pointing to target group.
MOOSE LB #1 ( terraform/100-moose/elb.tf > moose_ext ) moose20190919200450791200000004 ALB INTERNET 443/8088 443 points to 8088.
target group moose20190919200449849800000003
resource "aws_lb_target_group" "moose_ext_8088" pointing to dead moose indexers
MOOSE LB #2 ( terraform/100-moose/elb.tf > moose ) Originally, this was setup this way for Phantom. Phantom only supported one DNS for all of Splunk. It now supports distributed Splunk. moose20190919200454975400000005 ALB INTERNAL 8088 target group Can i just point this at the moose-targets? moose20190919200449849000000002 resource "aws_lb_target_group" "moose_8088" pointing to dead moose indexers
PROPOSED : Create ALB internal just for this thing and leave it in 100-moose. NOPE: iratemoses route53 points to ONE ELB with two listening ports. IDEA: create ireatemoses DNS in 100-moose and output the stuff upto moose. 8089 target group moose20190919200449848700000001 pointing to moose-splunk-sh
moose_int_target_group = "${aws_lb_target_group.moose_8088}" moose_ext_target_group = "${aws_lb_target_group.moose_ext_8088}" "${var.create_hec_lb == 1 ? 1 : 0 }"
CUSTOMER LB #1 ( terraform/modules/splunk_cluster/elb.tf > hec ) now working!
CUSTOMER LB #2 ( terraform/modules/splunk_cluster/elb.tf > nlb ) 9998 target group: -target Already working for Moose with TF variable var.lb_target_group from splunk_cluster/elb.tf.
CUSTOMER LB #3 ( terraform/modules/splunk_cluster/elb-classic-hec.tf > hec_classiclb ) 8088 now working!
target_group_arns is for ALB, or NLBs ONLY create new target groups for ports 8088.
IDXC D salt/fileroots/splunk/master/init.sls
Then, update the Route 53 to point to the ELBs.
PLAN: move the code to splunk_cluster and grab the output from splunk_indexer_asg then set the variables to false for the customers. count = "${var.make-lb == "true" ? 1 : 0 }" count = "${ var.create_private_dns == 1 ? var.count : 0 }" count = "${var.create_hec_lb == 1 ? 1 : 0 }"
count = "${var.create_moose_ext_lb == 1 ? 1 : 0 }"
count = "${var.create_moose_int_lb == 1 ? 1 : 0 }" count = "${var.create_moose_int_lb == 1 ? local.search_head_count : 0 }" count = "${var.create_moose_int_lb == 1 ? local.indexer_count : 0 }"
resource "aws_lb_target_group_attachment" "moose_ext_8088" {
count = "${local.indexer_count}"
target_group_arn = "${aws_lb_target_group.moose_ext_8088.arn}"
target_id = "${element(module.moose_cluster.idx_instances,count.index)}"
}
terraform apply -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg[2] -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg[1] -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg[0]
terraform apply -target=module.afs_cluster.module.indexer_cluster.aws_elb.hec_classiclb
terraform apply -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg0 -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg1 -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg2
terraform destroy -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg0 -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg1 -target=module.afs_cluster.module.indexer_cluster.aws_autoscaling_attachment.hec_classic_asg2
Internal DNS for -splunk-indexers Does anything use the customer-splunk-indexers DNS entry? Collectd uses the moose-splunk-indexers. PROPOSED: Lets create a new Route 53 that points to internal ALB and not static route53.
resource "aws_route53_record" "indexers"
in customer.tf
add additional variables for new module
asg_size_0 = 1
asg_size_1 = 1
asg_size_2 = 1
in customer_env module
in cust_variables.sls discovery-pass4SymmKey
in outputs.conf for splunk nodes
salt moose*cm* state.sls splunk.master test=true --state-output=changes
1.6 adjust SH,HF outputs.conf to point to IDXC Discovery
1.7 rm /opt/splunk/etc/apps/{{ salt['pillar.get']('cluster_name') }}_sh_outputs/local/outputs.conf
1.8 rm /opt/splunk/etc/apps/{{ salt['pillar.get']('splunk:cluster_name') }}_hf_ec2_outputs/local/outputs.conf
1.9 Run salt states to change outputs
1.11 salt moose*sh* state.sls splunk.search_head test=true --state-output=changes
1.13 salt moose*hf* state.sls splunk.heavy_forwarder test=true --state-output=changes
1.14 Update all minions to IDXC discovery
1.14.1 salt mail* state.sls internal_splunk_forwarder test=true --state-output=changes
salt minion pillar.item collectd:hec_hostname
2.2 salt minion network.connect iratemoses.msoc.defpoint.local 8088
2.3 salt minion state.sls collectd test=true --state-output=changes
2.3 salt-run survey.diff *.local cp.get_file_str file:///etc/collectd.conf
2.4 Ensure collectd metrics are in moose splunk.
2.4.1 | mstats count WHERE index=collectd metric_name=* by host, metric_name
2.5 Ensure Splunk UFs are in moose splunk
2.5.1 index="_internal" sourcetype=splunkd source="/opt/splunkforwarder/var/log/splunk/splunkd.log" | stats count by host
salt '*' saltutil.kill_job <job_id>
3.1.4 pkg.upgrade to patch the server
3.2 Ensure three green checkmarks in CMsalt '*.local' network.connect moose-splunk-cm.msoc.defpoint.local 8089
/opt/splunk/bin/splunk offline --enforce-counts
6.1.1 no users? create etc/system/local/user-seed.conf then restart ( https://answers.splunk.com/answers/834/how-to-reset-the-admin-password.html )systemctl stop sensu-agent
cd /opt/splunk/etc
mv passwd passwd.bak
vim system/local/user-seed.conf
[user_info]
PASSWORD = KbxvB97DBTXFcxKOqm0P
6.2 disable the service to prevent it startup back up
terraform destroy -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[0] -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[1] -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[2]
curl https://iratemoses.mdr-test.defpoint.com:8088 --insecure
8.2 RUN from laptop ON VPNsalt-run survey.diff '.local' cp.get_file_str file:////opt/splunkforwarder/etc/apps/moose_outputs/default/outputs.conf salt sensu cmd.run 'tail -50 /opt/splunkforwarder/var/log/splunk/splunkd.log'
salt phantom* cmd.run 'tail -200 /opt/splunkforwarder/var/log/splunk/splunkd.log | grep TcpOutputProc'
Check in Splunk Forwarder
for indexerdiscovery errors
index=_internal sourcetype="splunkd" source="/opt/splunkforwarder/var/log/splunk/splunkd.log" component=IndexerDiscoveryHeartbeatThread
index=_internal host=-splunk-cm sourcetype=splunkd source="/opt/splunkforwarder/var/log/splunk/splunkd.log" 10.96.103.34 OR 10.96.101.248 OR 10.96.102.23 component=TcpOutputProc
Check in CM,SH for indexerdiscovery index=_internal host=-splunk-cm sourcetype=splunkd source="/opt/splunk/var/log/splunk/splunkd.log" component=CMIndexerDiscovery index=_internal sourcetype="splunkd" source="/opt/splunk/var/log/splunk/splunkd.log" host="moose-splunk-sh.msoc.defpoint.local" component=TcpOutputProc "Initialization time for indexer discovery service"
terraform apply -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_launch_template.splunk_indexer -target=module.moose_cluster.module.indexer_cluster.module.indexer2.aws_autoscaling_group.splunk_indexer_asg
http://reposerver.msoc.defpoint.local/splunk/7.2/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found After salt highstate, splunkuf is running, but splunk is not. salt moose-splunk-indexer-i* cmd.run 'systemctl stop splunkuf' "/opt/splunk/etc/slave-apps/TA-Frozen-S3/bin/coldToFrozenS3.py"" ; no such script could be found on the filesystem." even after multiple attempts, Exiting.. ^^^ this is a bug in splunk!
added python3 to ALL servers including indexers and moved coldToFrozenS3.py to /usr/local/bin to fix the issue.
moose-splunk-indexer-i-01dc07f6a5
.msoc.defpoint.local
moose-splunk-indexer-i-0161555a16
.msoc.defpoint.local - 50G - done terminated, salt, sensu, victorops, scaleft, redhat
moose-splunk-indexer-i-0b9f30ce61
.msoc.defpoint.local
moose-splunk-indexer-i-087ecc377c
.msoc.defpoint.local
moose-splunk-indexer-i-0bada91cd6
.msoc.defpoint.local - 50G - done terminated, salt, sensu, victorops, scaleft, redhat
moose-splunk-indexer-i-055a31767d
.msoc.defpoint.local - 50G - done terminated, salt, sensu, victorops, scaleft, redhat
ERROR: ERROR IndexerDiscoveryHeartbeatThread - failed to parse response payload for group=afs-cluster, err=failed to extract FwdTarget from json node={"hostport":"?","ssl":false,"indexing_disk_space":-1}http_response=OK
SOLUTION: indexers had no inputs.conf! the indexers were not listening for incoming connections.
ERROR: coldToFrozen script not working pip3 install awscli chmod +x /usr/local/bin/aws