5 年之前 · 98aa75f7bc
--- a/Notes.md
+++ b/Notes.md
@@ -67,7 +67,7 @@ salt -C '* not ( afs* or saf* or nga* )' pkg.upgrade
 
				 ```
			
 
				 ## Error: `error: unpacking of archive failed on file /usr/lib/python2.7/site-packages/urllib3/packages/ssl_match_hostname: cpio: rename failed`
			
 
				 
			
 
				-`salt ma-* cmd.run 'pip uninstall urllib3 -y`
			
 
				+`salt ma-* cmd.run 'pip uninstall urllib3 -y'`
			
 
				 
			
 
				 This error is caused by the versionlock on the package. Use this to view the list
			
 
				 ```
			
--- a/Notes.md
+++ b/Notes.md
@@ -28,3 +28,10 @@ salt/pillar/prod/rhel_subs.sls
 
				 System emails are being sent to Moose Splunk. 
			
 
				 index=junk sourcetype=_json "headers.Subject"="*rotatelogs.sh"
			
 
				 
			
 
				+
			
 
				+## ERRORs
			
 
				+
			
 
				+Error: 'rhel-7-server-rpms' does not match a valid repository ID. Use "subscription-manager repos --list" to see valid repositories.
			
 
				+
			
 
				+$subscription-manager repos --list
			
 
				+This system has no repositories available through subscriptions.
			
--- a/Notes.md
+++ b/Notes.md
@@ -30,6 +30,9 @@ Hashicorp Terraform is used to deploy AWS resources by writing code.
 
				 
			
 
				 https://warrensbox.github.io/terraform-switcher/
			
 
				 
			
 
				+`brew install warrensbox/tap/tfswitch`
			
 
				+`brew install warrensbox/tap/tgswitch`
			
 
				+
			
 
				 If there is a file that has a terraform version specified, running `tfswitch` will automatically switch to that version.
			
 
				 
			
 
				 ## Debug
			
--- a/Notes.md
+++ b/Notes.md
@@ -168,7 +168,7 @@ FIX: ebs optimize needs to be set to false for t2.small instance size.
 
				 
			
 
				 Run the salt high state twice
			
 
				 
			
 
				-first time it gets 'stuck' when run with salt-call in the cloud-init.
			
 
				+first time it gets 'stuck' when run with salt-call in the cloud-init??.
			
 
				 kill it with saltutil.kill_job 20200528224000719269
			
 
				 
			
 
				 RHEL subscription failing (Error: Must specify an activation key) Pillar must be bad!
			
@@ -305,10 +305,10 @@ in outputs.conf for splunk nodes
 
				 # Steps for migraion to PROD
			
 
				 
			
 
				 1. Setup IDXC-Discovery on CM
			
 
				-1.1 prep CM for IDXC by rm -rf the outputs app
			
 
				+1.1 prep Cluster Master for IDXC by rm -rf the current outputs app
			
 
				 1.1.2 rm /opt/splunk/etc/apps/{{ pillar['cluster_name'] }}_cm_outputs/
			
 
				-1.5 Run salt state to enable IDXC discovery and enable IDXC outputs
			
 
				-1.5.1 `salt moose*cm* state.sls splunk.master`
			
 
				+1.5 Run salt state on CM to enable IDXC discovery and enable IDXC outputs. This will replace the deleted outputs app. 
			
 
				+1.5.1 `salt moose*cm* state.sls splunk.master test=true --state-output=changes`
			
 
				 1.6 adjust SH,HF outputs.conf to point to IDXC Discovery
			
 
				 1.7 rm /opt/splunk/etc/apps/{{ salt['pillar.get']('cluster_name') }}_sh_outputs/local/outputs.conf
			
 
				 1.8 rm /opt/splunk/etc/apps/{{ salt['pillar.get']('splunk:cluster_name') }}_hf_ec2_outputs/local/outputs.conf
			
@@ -316,23 +316,42 @@ in outputs.conf for splunk nodes
 
				 1.11 `salt moose*sh* state.sls splunk.search_head test=true --state-output=changes`
			
 
				 1.13 `salt moose*hf* state.sls splunk.heavy_forwarder test=true --state-output=changes`
			
 
				 1.14 Update all minions to IDXC discovery
			
 
				-2. Update all minions to new collectd internal endpoint: iratemoses.mdr.defpoint.com
			
 
				-2.1 ensure pillar is refreshed ` salt minion pillar.item `
			
 
				+1.14.1 `salt mail* state.sls internal_splunk_forwarder test=true --state-output=changes`
			
 
				+2. setup new iratemoses endpoint in TF.
			
 
				+2.0.1 TF apply in 05-customer_portal (openSGs), 100-moose (create DNS)
			
 
				+2.0 Update all minions to new collectd internal endpoint: iratemoses.msoc.defpoint.local
			
 
				+2.1 ensure pillar is refreshed ` salt minion pillar.item collectd:hec_hostname`
			
 
				 2.2 `salt minion network.connect iratemoses.msoc.defpoint.local 8088` 
			
 
				 2.3 `salt minion state.sls collectd test=true --state-output=changes`
			
 
				+2.3 `salt-run survey.diff *.local cp.get_file_str file:///etc/collectd.conf`
			
 
				+2.4 Ensure collectd metrics are in moose splunk.
			
 
				+2.4.1 `| mstats count WHERE index=collectd metric_name=* by host, metric_name`
			
 
				+2.5 Ensure Splunk UFs are in moose splunk
			
 
				+2.5.1 `index="_internal" sourcetype=splunkd source="/opt/splunkforwarder/var/log/splunk/splunkd.log" | stats count by host`
			
 
				 3. stand up new templates and ASGs
			
 
				 3.1 launch new ASG instances
			
 
				+3.1.1 run highstate on new indexers
			
 
				+3.1.2 kill defunct highstate with this
			
 
				+3.1.3 `salt '*' saltutil.kill_job <job_id>`
			
 
				+3.1.4 pkg.upgrade to patch the server
			
 
				 3.2 Ensure three green checkmarks in CM
			
 
				 4. change salt master to new outputs (make sure it is working)
			
 
				 4.1 ensure they can connect first 
			
 
				 4.2 `salt '*.local' network.connect moose-splunk-cm.msoc.defpoint.local 8089`
			
 
				-5. 
			
 
				 6. silence sensu 
			
 
				 6. manually take the non-ASG indexers offline but don't stop instances ( yes you can offline 2 indexers at a time. )
			
 
				 6.1 `/opt/splunk/bin/splunk offline --enforce-counts`
			
 
				 6.1.1 no users? create etc/system/local/user-seed.conf then restart ( https://answers.splunk.com/answers/834/how-to-reset-the-admin-password.html ) 
			
 
				+
			
 
				+`mv passwd passwd.bak`
			
 
				+
			
 
				+```
			
 
				+[user_info]
			
 
				+PASSWORD = NEW_PASSWORD
			
 
				+```
			
 
				+
			
 
				 6.2 disable the service to prevent it startup back up
			
 
				-7. use tf to destroy the instances then remove the code from TF. <- this is tricky
			
 
				+7. use tf to destroy the instances then remove the code from TF. <- this is tricky create new git branch
			
 
				 7.1 `terraform destroy -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[0] -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[1] -target=module.moose_cluster.module.indexer_cluster.module.indexers.aws_instance.this[2]`
			
 
				 8. ensure all LBs are pointing to the new indexers
			
 
				 8.1 Run from laptop NOT on VPN `curl https://iratemoses.mdr-test.defpoint.com:8088 --insecure`
			
@@ -377,4 +396,7 @@ moose-splunk-indexer-i-055a31767d05fb053.msoc.defpoint.local - 50G - done termin
 
				 
			
 
				 
			
 
				 
			
 
				+ERROR: 
			
 
				+ERROR IndexerDiscoveryHeartbeatThread - failed to parse response payload for group=afs-cluster, err=failed to extract FwdTarget from json node={"hostport":"?","ssl":false,"indexing_disk_space":-1}http_response=OK
			
 
				 
			
 
				+SOLUTION: indexers had no inputs.conf! the indexers were not listening for incoming connections.