[thwackCamp 2013 Chat Log] Building a Large-scale SolarWinds Installation

Version 1
    10/8/2013 13:0267.79.13.41francoishi all, how many devices are you guys running, approx, with orion?
    10/8/2013 13:0224.121.148.105superray72Awesome event so far. Perfect score on quiz.
    10/8/2013 13:0271.64.110.224LeonAdato(and my follow up question to Francois: how many ELEMENTS do you have?)
    10/8/2013 13:03206.181.226.34KMSigma1378 Noes, 3352 Interfaces, 4251 Volumes = 981 Elements.  :-)
    10/8/2013 13:0367.79.13.15Caraway to go superray72!!
    10/8/2013 13:03216.170.88.15wesleyoramaElements 5120\nNetwork Node Elements 1040\nVolume Elements 164\nInterface Elements 3916
    10/8/2013 13:03140.142.107.178RichardLetts5632 nodes, 29571 elements
    10/8/2013 13:03216.161.174.250ZeeApprox 2200 nodes
    10/8/2013 13:0366.112.206.6Chip709 nodes, 6000 elements
    10/8/2013 13:03204.145.114.6mdriskellNodes Monitored 3590\nInterfaces 4205\nVolumes monitored 13425\nApplications monitored 5407\nSynthetic transactions 85
    10/8/2013 13:04118.209.52.240Stevenfrancoisinternally: ~20.  Clients: varied from a few up to thousands
    10/8/2013 13:0467.79.13.41francoisthanks all, amazing responses, keep it coming
    10/8/2013 13:04206.181.226.34KMSigma46 Sites across 14 timezones
    10/8/2013 13:04216.16.135.2jtp7402115548 Nodes  33154 elements
    10/8/2013 13: KMSigma, how many pollers is that or do you do individual Orion suits?
    10/8/2013 13:0570.169.66.194ToddLhow do you determine elements?
    10/8/2013 13:05209.22.221.73RobertWhat apps does your monitoring team cover?
    10/8/2013 13:0571.64.110.224LeonAdatoThe SolarWinds way - an IP< interface, or volume.
    10/8/2013 13:05206.181.226.34KMSigmaWe are dual datacenter.  One poller in each.
    10/8/2013 13:0571.64.110.224LeonAdatoRobertall of them
    10/8/2013 13:05216.37.63.186Josh1030 Nodes, 4564 Interfaces, 0 Volumes, 5594 Total Elements
    10/8/2013 13:0663.226.32.16ecklerwr1solarwinds adds them up if you put the resource on your page it will add up nodes, interfaces, and volumes to get total elements.
    10/8/2013 13:06192.30.215.5ScottSadlochaOkay, I have two frames of reference.
    10/8/2013 13:06209.22.221.73RobertDo you have 7 people supporting only SolarWinds?
    10/8/2013 13:0667.79.13.41francoisAnybody running remote Additional Remote Pollers?
    10/8/2013 13:06206.181.226.34KMSigmaCore Details from Admin View will list elements
    10/8/2013 13:0662.105.164.174AshleyCpoller 1 Network Elements 1909 Nodes, 5361 Interfaces, 195 Volumes, 7465 Total Elements
    10/8/2013 13:06192.30.215.5ScottSadlochaCurrent environment:
    10/8/2013 13:06192.30.215.5ScottSadlochaElements 1584\nNetwork Node Elements 278\nVolume Elements 806\nInterface Elements 500
    10/8/2013 13:0662.105.164.174AshleyCpoller 2 Network Elements 296 Nodes, 6083 Interfaces, 1136 Volumes, 7515 Total Elements
    10/8/2013 13:06206.116.73.53Penney4 additional pollers
    10/8/2013 13:07192.30.215.5ScottSadlochaPrevious company:
    10/8/2013 13:07192.30.215.5ScottSadlochaPrimary poller and 2 additional pollers\n22169 elements\n2074 nodes\n17232 interfaces\n2863 volumes\n86784 pollers
    10/8/2013 13: know of one client running 11 pollers, not sure of nodes / interfaces
    10/8/2013 13:07192.30.215.5ScottSadlochaAt my current, it is one location. At my previous, we had about 46 across the globe
    10/8/2013 13:07206.181.226.34KMSigmaI am.  Primary Poller (Primary Data Center), Secondary Poller (Secondary Data Center), Add'l Web (Primary Data Center)
    10/8/2013 13:0771.64.110.224LeonAdatoRobertWe currently have a team of 4 guys plus a manager who support all the monitoring tools.
    10/8/2013 13:0771.64.110.224LeonAdatoRobertincluding SolarWinds.
    10/8/2013 13:0763.226.32.16ecklerwr1I've got 2 NPM, NTA, SAM, Virtualization Manager (one west one east) with additional poller on each.
    10/8/2013 13:0767.79.13.41francoisRhidianSremotely or centrally (remotelly=with a WAN between teh poller and the DB)
    10/8/2013 13:0871.64.110.224LeonAdatoKMSigmaFrom Admin, lower left corner, it's "Polling Engines"
    10/8/2013 13:08206.181.226.34KMSigmaMy SQL is in Primary DC.
    10/8/2013 13:08
    10/8/2013 13:08192.30.215.5ScottSadlochaWow, that many people supporting?
    10/8/2013 13:08192.30.215.5ScottSadlochaAt my previous location, with the higher numbers, it was just me.
    10/8/2013 13:0863.226.32.16ecklerwr1everything virtualized including the SQL servers
    10/8/2013 13:09208.78.228.100aniketPrimary Poller + 4 additional Pollers . 25000 Elements / 3000 nodes
    10/8/2013 13:09204.64.42.24Briansuch a little fish in this big pond: Elements 997\nNetwork Node Elements 331\nVolume Elements 424\nInterface Elements 242
    10/8/2013 13:0971.64.110.224LeonAdatoecklerwr1brave man
    10/8/2013 13:09118.209.52.240Stevenfrancois95% of the issues I run into with customer networks is firewall rules
    10/8/2013 13:0970.169.66.194ToddLwhere do you look up the number of elements in NPM
    10/8/2013 13:0965.192.236.140JFrazierHmmm... "A complete, accurate, meaningful inventory"... I want one !
    10/8/2013 13:09167.219.88.140newkidd24 guys for all network tools???  I have all network tools including most SW modules and 6 other network apps.  No wonder I have no hair left..  :-)
    10/8/2013 13:0971.64.110.224LeonAdatoSteven95% of the problems I run into are management rules.
    10/8/2013 13:0971.64.110.224LeonAdatoSteven;-)
    10/8/2013 13:09206.181.226.34KMSigmahttp://<SW-IP>/Orion/Admin/Details/OrionCoreDetails.aspx
    10/8/2013 13:0963.226.32.16ecklerwr1It works well now... I never virtualized the SQL until since 10.x NPM
    10/8/2013 13:09204.145.114.6mdriskellI have no hair or money since I met my wife...at least that's what my dad always told me
    10/8/2013 13:09192.30.215.5ScottSadlochaAt my previous company, I did monitoring, along with Endpoint Security, barcode scanners, and endpoint encryption. Yarrgh.
    10/8/2013 13:1071.64.110.224LeonAdatoecklerwr1I did it at my last gig, but in this one the environment is just too big.
    10/8/2013 13:10118.209.52.240StevenLeonAdatook, 95% of the technical reasons
    10/8/2013 13:1071.64.110.224LeonAdatoecklerwr1Plus they had a 24-proc, 192Gb RAM box sitting there fore me, so I wasn't going to say NO.
    10/8/2013 13:1067.79.13.41francoisStevenyou mean to connext remote pollers to main site?
    10/8/2013 13:10206.116.73.53PenneyNodes 3,500 / Elements 22,000 / 5 Pollers
    10/8/2013 13:10140.142.107.178RichardLettslike: actionable procedure = alert
    10/8/2013 13:10192.30.215.5ScottSadlochaAt my current place, there are a few of us, depending on area.
    10/8/2013 13:1063.226.32.16ecklerwr1nice LeonAdato :^}
    10/8/2013 13:1069.30.53.222byronaCaraI think this is a great list of questions and should be used for any deployments, not just large deployments.
    10/8/2013 13:11118.209.52.240StevenfrancoisSorry, was meant to be in general, not direct to yourself. I mean clients need to configure their firewalls to allow the ICMP/SNMP traffic to/from the SW server and the devices they want to monitor
    10/8/2013 13:1171.64.110.224LeonAdatoecklerwr1I may not necessarily be accurate, but I'm usually entertaining.
    10/8/2013 13:11216.161.174.250Zeewhere did he find the permitted pollers enable/disable?
    10/8/2013 13:1171.64.110.224LeonAdatobyronaExactly
    10/8/2013 13:1171.64.110.224LeonAdatobyronaI think I said earlier - these are the same questions you'd ask for ANY implementation.
    10/8/2013 13:11204.145.114.6mdriskellDoes anyone else isolate their pollers? For instance I have 4 but I have one dedicated to trapping.  We've had issues where a rogue device basically attacked us sending 200K traps in less than an hour.  Crippled SW
    10/8/2013 13:1171.64.110.224LeonAdatobyronaBut in a large-scale one, the impact of missing is higher.
    10/8/2013 13:11167.219.88.140newkidd2syslog alerts still could use being tied to the node's Management Status...  It could save a lot of time and spam
    10/8/2013 13:1171.64.110.224LeonAdatobyronaIn a smaller install, you can "fake" it and nobody will notice.
    10/8/2013 13:1271.64.110.224LeonAdatomdriskellwait for the next slide.
    10/8/2013 13:12209.22.221.73RobertIs your SQL virtualized and clustered?
    10/8/2013 13:12204.145.114.6mdriskellWaiting for any slide
    10/8/2013 13:1271.64.110.224LeonAdatomdriskellI've seen very successful installs destroyed when syslog and trap are added.
    10/8/2013 13:12192.30.215.5ScottSadlochaBoth SW environments I have experience with were virtualized
    10/8/2013 13:1269.30.53.222byronaLeonAdatoSorry, I missed that part where you noted these are always good.  Unfortunately I am stuck working while attending this. = (
    10/8/2013 13:1263.234.242.12YourVillageIdiotElements 14,000 / Nodes 2300 / Pollers 2
    10/8/2013 13:1371.64.110.224LeonAdatoRobertMine is a physical box running 3 db instances - one for our incumbent database (a company with 3 letters for it's name that rhymes with "Buy Me Them"
    10/8/2013 13:13173.13.67.184Stevei have to get off a shared SQL server .. its killing me
    10/8/2013 13:1363.226.32.16ecklerwr1My is virtualized on ESX5.1 not clustered
    10/8/2013 13:13209.22.221.73RobertHow many users do you have per additional webserver?
    10/8/2013 13:1371.64.110.224LeonAdatoEverything else in our environment is virtual. It's only the database is physical.
    10/8/2013 13:1374.43.179.4jakeDo we need the EOC in order to use the DMZ web front end?
    10/8/2013 13:1371.64.110.224LeonAdatoRobertUsers?
    10/8/2013 13: question came up today at work about running a poller with 2 IP addresses, 1 for snmp/wmi/icmp and one for netflow. Is that possible?
    10/8/2013 13:1371.64.110.224LeonAdatoRobertUsers don't factor into it, really.
    10/8/2013 13:1471.64.110.224LeonAdatoRhidianSI'll let the SW guys answer that. I'm honestly not sure.
    10/8/2013 13:1471.64.110.224LeonAdatoRhidianSI've never done it that way.
    10/8/2013 13:14209.22.221.73RobertWe have users and admins on ours...what's your max limit before you add another additional webserver?
    10/8/2013 13:14140.142.107.178RichardLettsRhidianSnot a problem, but I'm not sure what the benefit is.
    10/8/2013 13:1467.79.13.41francoisRhidianSnever heard of that, but I'll double check and come back to you
    10/8/2013 13:1571.64.110.224LeonAdatoRobertAh, each additional poller doesn't run its own webserver
    10/8/2013 13:1571.64.110.224LeonAdatoRobertYour website is always running off the primary poller, OR an additional web server if you get it.
    10/8/2013 13:1571.64.110.224LeonAdatoRobertsee the slide on screen right now.
    10/8/2013 13:1571.64.110.224LeonAdatoRobertover to the left
    10/8/2013 13:1571.64.110.224LeonAdatoRobertor this slide too.
    10/8/2013 13:15173.13.67.184Steveadditional web server gets the load off your primary -
    10/8/2013 13:15
    10/8/2013 13:15192.30.215.5ScottSadlochaWe only used one webserver at my previous place. Same at my current. Only users were IT admins though.
    10/8/2013 13:15209.22.221.73RobertI know...I'm asking how many users you put on an Additional Webserver before you add another webserver.
    10/8/2013 13:16209.22.221.73RobertWhat's the max?
    10/8/2013 13:1671.64.110.224LeonAdatoZeeThat's under the windows services, SNMP agent, security tab.
    10/8/2013 13:1667.79.13.41francoisRobertwe typically say about 20 concurrent users
    10/8/2013 13:16209.22.221.73Robertor best practice?
    10/8/2013 13:16216.161.174.250Zeety
    10/8/2013 13:1671.64.110.224LeonAdatomdriskellthere's your trap and syslog stuff.
    10/8/2013 13:16204.145.114.6mdriskellI see...had to reload nothing had ever started on my end
    10/8/2013 13:16209.22.221.73RobertThanks Francois....what's everyone else using as your typical maximum?
    10/8/2013 13:1671.64.110.224LeonAdatoRobertWe have WAY more than that on our additional webserver.
    10/8/2013 13:1767.79.13.41francoisRobertanybody has a need for an Orion instance (main + pollers and single DB) that would handle 100,000 elements or more?
    10/8/2013 13:1771.64.110.224LeonAdatoRobertI've never given "maximum users" a thought. It's a webserver.
    10/8/2013 13:1771.64.110.224LeonAdatoRobertYeah - us!
    10/8/2013 13:1767.79.13.41francoisLeonAdatook :-)
    10/8/2013 13:1867.79.13.41francoisLeonAdatoanybody else on the 100K+ elements on a single instance?
    10/8/2013 13:18209.22.221.73RobertThen I guess you have to base it on your SLA for CPU and Memory... you'll probably increase your memory before adding another server first though.
    10/8/2013 13:1867.79.13.41francoisLeonAdatohow many? Are you talking WAY more concurrent accesses?
    10/8/2013 13:1971.64.110.224LeonAdatoRobertYeah, our additional web server only has 4 proc and 8Gb RAM. We have lots of room to grow.
    10/8/2013 13:1967.79.13.41francoisLeonAdatogotcha
    10/8/2013 13:20209.22.221.73RobertWe are running 50 now on our primary poller and looking at going to an additional webserver...just wondering the max concurrent connections we should expect before adding another additional webserver.
    10/8/2013 13:2071.64.110.224LeonAdatoRobertIn a large environment I move the webserving functions off because I want the primary nice and clean and happy.
    10/8/2013 13:2071.64.110.224LeonAdatoRobertand serving up those graphs and charts and poorly written SQL reports (guilty as charged) takes processing cycles away from the job of being a primary poller.
    10/8/2013 13:21204.64.223.35LanceRIf you move the webserver from the primary poller, and put it on its own server, does that require an additional license?  Or since you are pulling it from the primary poller, does that only count as one?
    10/8/2013 13:22140.142.107.178RichardLettsLanceRYes.
    10/8/2013 13:2271.64.110.224LeonAdatoLanceRAdditional webserver is it's own product
    10/8/2013 13:2271.64.110.224LeonAdatoLanceRand for the record, you don't "move" it.
    10/8/2013 13:2271.64.110.224LeonAdatoLanceRThe webserver is still running on the primary. You just don't tell anyone it's there any more.
    10/8/2013 13:22204.64.223.35LanceRah, ok, that works...thanks!
    10/8/2013 13:2371.64.110.224LeonAdatoLanceRwhich means YOU have a super-secret portal you can use without all those pesky users getting in your way.
    10/8/2013 13:23204.64.223.35LanceR
    10/8/2013 13:23208.57.0.134Jamesour teams provision inside the NPM for performance, but everyone else uses the seperate servers
    10/8/2013 13:24
    10/8/2013 13:24206.116.73.53Penneywe added ramdisk to our primary engine which is also our webserver. only the iis temp files are directed to ramdisk - our orion website is fast, and we no longer experience lag after a restart.
    10/8/2013 13:2471.64.110.224LeonAdatoecklerwr1Along with my world-famouse musical review of Sondheim favoriates, featuring my trained duck Waldo.
    10/8/2013 13:2463.226.32.16ecklerwr1:^}
    10/8/2013 13:2471.64.110.224LeonAdatoPenneythat's a kick-@ss idea!
    10/8/2013 13:24167.216.131.126ManhaakWorking on a new deployment plan either centralized OR centralized with polling engines.....What would be the best to way to decide whats best for our environment.So if I am expecting to have devices/components well below the load servers can handle do I really need additional polling engine...
    10/8/2013 13:25206.116.73.53Penneyit wasn't my idea - it was tip i got from advanced orion course i took (hosted by Andy McBride).
    10/8/2013 13:2563.226.32.16ecklerwr1Manhaak no as long as you can get icmp and snmp there.
    10/8/2013 13:25209.22.221.73RobertSo when you get up to around 4,000+ devices is that when you should consider offloading web?  I always thought it was based on the number of users accessing, not the number of devices.
    10/8/2013 13:2571.64.110.224LeonAdatoI was so excited during the recording I cut my CPU's in half.
    10/8/2013 13:2571.64.110.224LeonAdatoWe actually have 24
    10/8/2013 13:26216.168.115.179PaulOur new SQL server is going to be RAID 10 using SSD's - anyone using SSD drives in their setup?
    10/8/2013 13:2671.64.110.224LeonAdatoManhaakRemember the slide with DMZ?
    10/8/2013 13:26192.30.215.5ScottSadlochaNo SSDs here
    10/8/2013 13:2771.64.110.224LeonAdatoManhaakIf you can get to the devices from the core network (ie: where your poller is) then do that. If you can't - if it's a DMZ and you can't get all the firewall ports opened, then youwill need a poller even if it's 3 devices.
    10/8/2013 13:2769.30.53.222byronaOur database is on SSD's.  We are using Raid 5 with our SSD's but that doesn't seem to be causing it any issues.
    10/8/2013 13:27209.22.221.73RobertWhy not virtualize SQL?
    10/8/2013 13:2769.30.53.222byronaOur database is also virtualized; however, it has a dedicated Host
    10/8/2013 13:2767.79.13.41francoisRobertit's more a matter of users, unless your 4000 devices have already maxed out your instabnce, in which case offloading with an addition web server is teh right thing to do
    10/8/2013 13:27209.22.221.73RobertMy client is directing me to go virtual.
    10/8/2013 13:27140.142.107.178RichardLettsTimezones bad, UTC good.
    10/8/2013 13: there a stress test tool for testing an Orion server, to give a rough idea of capability?
    10/8/2013 13:27209.22.221.73RobertAgreed RichardLetts!
    10/8/2013 13:28118.209.52.240StevenIf a client has multiple timezones, we generally push them towards UTC
    10/8/2013 13:28118.209.52.240Steventhe underlying network is typically UTC anyway so it makes it easier
    10/8/2013 13:2871.64.110.224LeonAdatoRichardLettsAgreed, but someone (M@N3G3M3NT) insisted we do it in "our" timezone.
    10/8/2013 13:28216.168.115.179PaulROFL re timezone (LeonAdato) - I can relate
    10/8/2013 13:29192.30.215.5ScottSadlochaSame thing here at my previous employer.
    10/8/2013 13:2967.79.13.41francoisRobertFYI an additional web server license is really cheap, not worth not doing, in case of doudt
    10/8/2013 13:29192.30.215.5ScottSadlochaMultiple timezones, but they wanted it in the timezone of HQ
    10/8/2013 13:29209.22.221.73RobertSame location?  Same subnet!
    10/8/2013 13:2967.79.13.41francoisRobertwe can talk off line, if you need more help with theses questions, Robert
    10/8/2013 13:30216.168.115.179PaulMajor feature request - please remove PING as a requirement to determine if a node is up ...
    10/8/2013 13:3071.64.110.224LeonAdatoand replace it with.... carrier pigeon?
    10/8/2013 13:3071.64.110.224LeonAdatoA warm hug?
    10/8/2013 13:3067.79.13.41francoisPaulsnmp?
    10/8/2013 13:30208.57.0.134Jamesseems like any snmp response counts as up
    10/8/2013 13:30209.22.221.73RobertThe number of Elements you can monitor with a polling engine is also determined by your polling settings...the faster your polling the more resources you need.
    10/8/2013 13:3171.64.110.224LeonAdato(j/k had to take the easy joke)
    10/8/2013 13:31206.116.73.53Penneywe have add'l web server license, but i stopped using it, the administrative tasks during upgrades etc was a pain. We have 4 add'l pollers, so already upgrades were becoming quite involved. we have lots of reports - this is the painful part. solarwinds needs to improve management for orion admins with add'l web servers.
    10/8/2013 13:3167.79.13.41francoisPaulit's on the list for future, pretty high
    10/8/2013 13:3171.64.110.224LeonAdatoRobertAgreed. I've told several clients that they can monitor 500,000 elements if they only check them once per day.
    10/8/2013 13:31216.168.115.179PaulIf a node isn't pingable it trips as "down"
    10/8/2013 13:31209.22.221.73Robert10,000 would be optimal with default polling engine settings.
    10/8/2013 13:3174.112.1.129Amberwhat is the default poll time?
    10/8/2013 13:31118.209.52.240StevenJamesstatus is purely based on the ICMP
    10/8/2013 13:31216.168.115.179Paulthanks francois - we've been waiting for 4+ years for that feature
    10/8/2013 13:31167.216.131.126ManhaakAshleyIf I have a NPM license SL2000 can I install additional poller engine OR do I need unlimited license to do so and if I can then the max number of elements polling engine + main poller can handle is cumulative is it ? so both together can handle 2000 volumes & 2000 interfaces & 2000 nodes
    10/8/2013 13:3271.64.110.224LeonAdatoPenneyPenny - reports now syncronize automatically from primary to additinal poller.
    10/8/2013 13:3271.64.110.224LeonAdatoPenneyI know what you are talking about, but it's not the case any more.
    10/8/2013 13:3271.64.110.224LeonAdatoPenneySince version 10.5.
    10/8/2013 13:3267.79.13.41francoisPaulyeah, pretty freqent request
    10/8/2013 13:3271.64.110.224LeonAdatoPenneyBefore that you had to do the funky copy-to-/Reports folder thing.
    10/8/2013 13:32204.145.114.6mdriskellIs this a custom report?
    10/8/2013 13:3371.64.110.224LeonAdatomdriskellYes, but I posted it on thwack a while ago.
    10/8/2013 13:33216.168.115.179PaulCool report - will have to check it out ... thanks!
    10/8/2013 13:33206.116.73.53Penneyhuge thanks . we r currently 10.4.2 - wasn't aware of that improvement to 10.5 - NICE.
    10/8/2013 13:332.27.145.98RhidianSIs a nice report
    10/8/2013 13:3362.105.164.174AshleyCanyone have a link to the report?
    10/8/2013 13:3471.64.110.224LeonAdatomdriskellI'm trying to grab it now, but the razzing from the guys in the office is pretty hard to cut through.
    10/8/2013 13:3471.64.110.224LeonAdato;-)
    10/8/2013 13:3468.114.34.11bwicksfrancoisDoes anyone have 100,000+ element count in 1 environment
    10/8/2013 13:34209.22.221.73RobertLeon...when your capacity planning....how many polling engines could you lose without impacting your ability to poll all the nodes?  I'm thinking that I should plan for at least the loss of 1 polling engine and have enough capacity to support the loss of one on the other pollers..right/wrong?
    10/8/2013 13:35208.57.0.134Jamesyes
    10/8/2013 13:35204.145.114.6mdriskellhttp://thwack.solarwinds.com/docs/DOC-168997
    10/8/2013 13:3562.105.164.174AshleyCThanks buddy
    10/8/2013 13:3571.64.110.224LeonAdatoRobertDepends on the size of your site.
    10/8/2013 13:36204.145.114.6mdriskellLOL as soon as I saw it I'm like I need that report
    10/8/2013 13:3671.64.110.224LeonAdatoRobertIf you have 100,000 elements spread across 12 pollers, 1 more isn't going to do it.
    10/8/2013 13:3667.79.13.41francoisbwicksit reaches teh limit but if you decrease the polling period (e.g. once a day) you can go above. I think Leon said above he has seen people doing it
    10/8/2013 13:36204.145.114.6mdriskellLeon just want's more thwack points for us all downloading it
    10/8/2013 13:362.27.145.98RhidianS
    10/8/2013 13:3671.64.110.224LeonAdatomdriskellNo not at a... well yeah
    10/8/2013 13:36209.22.221.73RobertI agree but my question is do you plan for the loss of 1 poller or 2?
    10/8/2013 13:36208.57.0.134Jameslol
    10/8/2013 13:3771.64.110.224LeonAdatoRobertThere are other options.
    10/8/2013 13:3771.64.110.224LeonAdatoRobertSolarWinds has a cold-swap option, a hot-swap option, and a live failover option
    10/8/2013 13:37118.209.52.240Steven*cough* the password is on the SW server
    10/8/2013 13:3871.64.110.224LeonAdatoRobertThat said, I would do what you need to cover 20-30% of your environment
    10/8/2013 13:3871.64.110.224LeonAdatoRoberttjat
    10/8/2013 13:3871.64.110.224LeonAdatoRoberts kist ,e s[otba;;omg
    10/8/2013 13:3865.192.236.140JFrazieractive state perl ?
    10/8/2013 13:3871.64.110.224LeonAdatoStevenYeah, but if it crashed and you have to rebuild it, that's not an option.
    10/8/2013 13:3868.114.34.11bwicksfrancoisNetwork Elements\nTotal Count 263161\nNodes 11316\nInterfaces 223868\nVolumes 27977
    10/8/2013 13:38209.22.221.73RobertGreat....where can I find more info on the hot-swap?  Are you talking about FOE?
    10/8/2013 13:38204.145.114.6mdriskellFYI that's easier now to open tickets with the individual accounts
    10/8/2013 13:38140.142.107.178RichardLettsnow you'll have to change your DB password.
    10/8/2013 13:3866.112.206.6Chiplike mine did Friday, had to do a full bare metal rebuild
    10/8/2013 13:3966.112.206.6Chipall of the documentation that was just went over has been added to our doc inventory
    10/8/2013 13:3971.64.110.224LeonAdatomdriskellAgreed. I did this recording before they did that, but individual accounts now ROCKS!
    10/8/2013 13:3967.79.13.41francoisbwickssingle instance?
    10/8/2013 13:3968.114.34.11bwicksfrancoisyup
    10/8/2013 13:3967.79.13.41francoisbwickswow
    10/8/2013 13:3967.79.13.41francoisbwickswhat polling frequency for status and stats?
    10/8/2013 13:39118.209.52.240StevenLeonAdatoAgreed - doco is good if your server dies completely... but for brain lapses..
    10/8/2013 13:39167.216.131.126ManhaakAshleyDo we need to purchase FOE lincense or it comes along with SAM SLX
    10/8/2013 13:3967.79.13.41francoisbwicksdid not remember it was so high...
    10/8/2013 13:40140.142.107.178RichardLettsbwicks[Emotion=emthup.gif]wow.
    10/8/2013 13:4068.114.34.11bwicksfrancoisyea keeps growing
    10/8/2013 13:4012.183.67.10VisibilityTeamRivrMuse n- ight T Now for ECA
    10/8/2013 13:4068.114.34.11bwicksfrancois240 sec status 9 min stat
    10/8/2013 13:4171.64.110.224LeonAdatoOk guys, let me have it!
    10/8/2013 13:4165.192.236.140JFrazierwe are about to demo rightITnow
    10/8/2013 13:41118.209.52.240StevenLots of great info in that presentation!
    10/8/2013 13:41204.145.114.6mdriskellI'd really like more info on successful use of Orion FOE.  Didn't really get to detailed on that
    10/8/2013 13:42206.116.73.53PenneyQuestion for Leon - do you usually wait for minor release, or do you jump at new releases with large installation ?
    10/8/2013 13:4267.79.13.41francoismdriskellwill ping you off line
    10/8/2013 13:4266.159.100.4AshleyI'm trying to download it myself - but for some reason it's only ever downloading the first session
    10/8/2013 13:4271.64.110.224LeonAdatoKMSigmaJust NPM and SAM for now.
    10/8/2013 13:4271.64.110.224LeonAdatoKMSigmaMy nepharious plans for world domination include VQM and NTA soon.
    10/8/2013 13:42167.216.131.126ManhaakAshley@Leon: can you give details on FOE licensing do we buy it or comes with SAM SLX
    10/8/2013 13:4371.64.110.224LeonAdatoKMSigmaStorage and Virtualization after that.
    10/8/2013 13:432.27.145.98RhidianSVQNM :D!
    10/8/2013 13:43209.22.221.73RobertLeon - What do you think about Virtualizing SQL?  Why haven't you done it?  Also, are you doing any clustering?  You didn't really talk about COOP/DR...how are you handling that?  Do you have a mirror of your 10 polling engines sitting idle in another site?
    10/8/2013 13:43208.57.0.134JamesVersion 10.6 has had some poller changes, are there any thoughts on if there is a performance increase on the pollers?
    10/8/2013 13:4367.79.13.41francoisManhaakdoes not come with SAM SLX
    10/8/2013 13:4371.64.110.224LeonAdatoPenneywe have a separate dev installation and we test RIGHT NOW stuff on that.
    10/8/2013 13:43118.209.52.240StevenmdriskellFOE Licensing is based on SAM, NPM, NCM... need to buy a P1, P2, or P3 license based on how many of those 3 modules you have
    10/8/2013 13:4371.64.110.224LeonAdatoPenneyWe also have a Proof of concept server for betas.
    10/8/2013 13:4367.79.13.41francoisJamesnot aware of any
    10/8/2013 13:43118.209.52.240Stevenmdriskelloops - not meant direct to you sorry!
    10/8/2013 13:43129.72.124.186scottweOne thing I am curious about. How granular do folks get with views for non-admin types? we've built account limitations and views but will basically let folks see anything regarding those nodes. We're finding some pretty creative "failure scenarios" with this approach. Anyone read a good best practice for help desk types or is less more?
    10/8/2013 13:43216.168.115.179PaulWe are looking for a new "ticketing system" and always looking for feedback - we are a Service Provider so we deal with internal "IT" issues but also external customer support tickets.  I'm looking at SW WHD but not convinced yet that it's really a good fit specific to our needs ... may end up taking Request Tracker and customizing the heck out of it... more a question for the group than Leon in particular....
    10/8/2013 13:44206.181.226.34KMSigmaRhidianSWe run NPM, SAM, IPAM, NCM, NTA, and VNQM.  Changes the scaling and polling numbers.
    10/8/2013 13:4471.64.110.224LeonAdatoPenneyso the betas go on the POC, the RC's go on the dev, and then the GA's go to prod.
    10/8/2013 13:4471.64.110.224LeonAdatoPenneyMinor releases and patches are tested on dev, then rolled to prod.
    10/8/2013 13:44216.170.88.15wesleyoramaLeon, We are expanding out from our single poller to a two poller in two states build out. You brought up both using duplicates to see if something is reachable from one location or the other. Then you said having duplicates can lead to a bad place. At what point should you start worrying about duplicates?
    10/8/2013 13:4471.64.110.224LeonAdatoPenneyOne note on that for the cost-concious: talk to sales about pricing on "lab" systems.
    10/8/2013 13:45167.216.131.126ManhaakAshley@Leon: I have 1 main poller + one polling engine: The load this can handle is cumulitive if I have NPM Sl2000 or each can sustain its own max load.
    10/8/2013 13:45204.145.114.6mdriskellfrancoisLeon...care to share what you use as your first level trap rcvr...you can ping me direct on thwack if you don't want to publish it here
    10/8/2013 13:4571.64.110.224LeonAdatoPenneyyou can often get a significant discount when the box is non-prod. AND you don't have to buy the same thing you have in prod.
    10/8/2013 13:4571.64.110.224LeonAdatoPenneyOur prod is "unlimited" but our POC server is the smallest license possible.
    10/8/2013 13:4571.64.110.224LeonAdatowesleyoramawhen you don't want them.
    10/8/2013 13:45216.170.88.15wesleyoramaThats what I was thinking
    10/8/2013 13:4571.64.110.224LeonAdatowesleyoramaSeriously, if you INTENTIONALLY duplicate a node on two pollers, it's all good.
    10/8/2013 13:46206.116.73.53PenneyLeonAdatowe have a test box with small license. I was more curious if you would apply 10.6 to prod (even after running through test server), or do you wait for 10.6.1 ?
    10/8/2013 13:4671.64.110.224LeonAdatowesleyoramaYou just want to make sure it's not happening by accident.
    10/8/2013 13:46216.170.88.15wesleyoramaLeonAdatothanks
    10/8/2013 13:4671.64.110.224LeonAdatoPenneyHayel No. We got that on in prod asap.
    10/8/2013 13:46209.22.221.73RobertAre your dev licenses the same size as what you have in production?
    10/8/2013 13:46208.57.0.134Jamesno
    10/8/2013 13:4671.64.110.224LeonAdatoPenneyBUT the point is that I'd already been beta testing and RC testing 10.6 for weeks prior, so I knew what to expect.
    10/8/2013 13:4663.226.32.16ecklerwr1D@ng can't reach youtube from work... have to wait till I get home... every question has a youtube link on quiz
    10/8/2013 13:4771.64.110.224LeonAdatoRobertOurs is, but only because we wanted to test what happened when you maxed out the hardware - which you can't do if your license limits to 150 elements.
    10/8/2013 13:472.27.145.98RhidianSThere was a post about modification to the handling of the netflow data up and coming. Will that reduce the load on the database?
    10/8/2013 13:48209.22.221.73RobertWhat did you use to max out the equipment?  Care to share your findings?
    10/8/2013 13:4871.64.110.224LeonAdatoRobertWell, I do have 9,000 nodes at my disposal.
    10/8/2013 13:4871.64.110.224LeonAdatoRobert;-)
    10/8/2013 13:4971.64.110.224LeonAdatoRobertAnd the result was that the polling cycles throttled down and we missed cycled.
    10/8/2013 13:49209.22.221.73RobertOur production and dev environments aren't connected..so I'd have to simulate.
    10/8/2013 13:492.27.145.98RhidianSThats good Sandip, will give me time to look in the morning (GMT here and don't fancy a test tonight )
    10/8/2013 13:5062.105.164.174AshleyC@Leon random question but do you have any examples of alert "DashBoards"? We have a rather large NOC with a monitoring display on there for Active Alerts "Just on NODES" but it is not very effective visual aid to issues
    10/8/2013 13:50140.142.107.178RichardLettsPaulIf you're looking at linking to an ticketing system then you might want to look at integration products to do better alert correlation.
    10/8/2013 13:5063.226.32.16ecklerwr1I agree RhidianS
    10/8/2013 13:51216.168.115.179PaulThanks RichardLetts - we are completely open to ideas ... I was hoping for something that has tight integration options vs dependency on filtering email alerts etc
    10/8/2013 13:51209.22.221.73RobertLeonAdato: are you running EOC as well? If so, just one for your environment?
    10/8/2013 13:5163.226.32.16ecklerwr1Paul if you're a coder... check out the API to link to ticket system.\
    10/8/2013 13:522.27.145.98RhidianSI'm looking 'playing' with GNS3 and I'm looking at using Ostinato for creating traffic. Has any one tried such a virtual setup?
    10/8/2013 13:52140.142.107.178RichardLettsPauljust started looking at RightITNow -- I know others that are into alerting in a big way are using/looking at it.
    10/8/2013 13:5263.226.32.16ecklerwr1Paul it's also been done by people with alerts emailing ticket systems.
    10/8/2013 13:5271.64.110.224LeonAdatoAshleyCThat (dashboards) is one of the areas that my colleagues haven't asked for yet.
    10/8/2013 13:5271.64.110.224LeonAdatoAshleyCPartially because teh culture is still very much ticket-driven
    10/8/2013 13:52216.168.115.179PaulThanks Ecklerwr1 - I'm not a heavy coder but have budget to bring one onboard - just trying to settle in on a solid option and then figure out integration options etc
    10/8/2013 13:5371.64.110.224LeonAdatoAshleyCand Partially because I've been so busy getting the environment up and running that I haven't had a chance to "market" that feature yet.
    10/8/2013 13:53140.142.107.178RichardLettsPaulalerts emailing ticketing systems is okay, but if you get something fire 6000 emails into your ticketing system then it's a bit of a pain to cleanup.
    10/8/2013 13:5371.64.110.224LeonAdatoAshleyCIn my last gig, We did a number of NOC-specific displays though.
    10/8/2013 13:5362.105.164.174AshleyCLeonAdatoAh Ok.. I am working on something a little better than the out of the box Active Alerts that I am hopeing to share once I have it completed.
    10/8/2013 13:5371.64.110.224LeonAdatoAshleyCI ended up hacking the webserver to bits.
    10/8/2013 13:5371.64.110.224LeonAdatoAshleyCRe-writing CSS, etc.
    10/8/2013 13:53118.209.52.240StevenWe've set up some integration with ticketing systems however it's a one-way affair. ie. sending an email or trap to the ticketing system which then handles the alert/notifications/escalations/etc. It doesn't talk back to the SW server though
    10/8/2013 13:5362.105.164.174AshleyCLeonAdatoAHH ok
    10/8/2013 13:53216.168.115.179PaulThanks RichardLetts - yeah, we want to take a careful approach for sure
    10/8/2013 13:5362.105.164.174AshleyCLeonAdatoI thought as much
    10/8/2013 13:5471.64.110.224LeonAdatoAshleyCI can give you some thoughts on that if you like.
    10/8/2013 13:5471.64.110.224LeonAdatoAshleyCUnless any of my Sentinel buddies are on this session.
    10/8/2013 13:5462.105.164.174AshleyCLeonAdatothere is isnt anything really out the box to make an affective display
    10/8/2013 13:5471.64.110.224LeonAdatoAshleyCI think there's more than you think.
    10/8/2013 13:5471.64.110.224LeonAdatoAshleyCdepending on what you are trying to do.
    10/8/2013 13:54216.168.115.179PaulThanks Shuth - I'm really hung up on using API/REST or similar integration for two way function ... maybe I'm dreaming but was hoping someone had already developed this in line with my "vision"
    10/8/2013 13:5471.64.110.224LeonAdatoAshleyCthe hardest part is getting the font sizes and colors to work on a large display.
    10/8/2013 13:5462.105.164.174AshleyCLeonAdatoIf you have documentation that would be brilliant.
    10/8/2013 13:5562.105.164.174AshleyCLeonAdatoYeah thats true the font size is an issue.
    10/8/2013 13:5562.105.164.174AshleyCLeonAdatoalso the color of charts is a pain being white
    10/8/2013 13:5571.64.110.224LeonAdatoAshleyCI might be able to scrounge something up.
    10/8/2013 13:5562.105.164.174AshleyCLeonAdatobeing able to invert them would be brilliant
    10/8/2013 13:5562.105.164.174AshleyCLeonAdatoThank you very much I really appreciate it
    10/8/2013 13:5671.64.110.224LeonAdatoAshleyCIf that's what you mean, the fix was "simple" (copy the /inetpub/solarwinds folder to another folder, make it a new website on a new port, point your NOC displays to that new website:port, and then get to work)
    10/8/2013 13:56209.22.221.73RobertIs it possible to view the chat sessions afterwards?
    10/8/2013 13:5671.64.110.224LeonAdatoAshleyCnot EASY, but it was simple.
    10/8/2013 13:5671.64.110.224LeonAdatoAshleyCFriend and then DM me on thwack so I can send you what you need.
    10/8/2013 13:5668.114.34.11bwicksfrancoisLeon what are your data retention for detail in both NPM and SAM
    10/8/2013 13:5671.64.110.224LeonAdatoAshleyCthwack name is "adatole"
    10/8/2013 13:5671.64.110.224LeonAdatobwicksThe standard.
    10/8/2013 13:5671.64.110.224LeonAdatobwicksDetail summarizes to hourly every week
    10/8/2013 13:5771.64.110.224LeonAdatobwickshourly to daily every month
    10/8/2013 13:5762.105.164.174AshleyCLeonAdatoI will get to adding you
    10/8/2013 13:5768.114.34.11bwicksfrancoisk ty
    10/8/2013 13:5771.64.110.224LeonAdatobwicksDaily is kept for 365 days.
    10/8/2013 13:5762.105.164.174AshleyCLeonAdatoAgain thank you for your time
    10/8/2013 13:58216.168.115.179PaulYourVillageIdiot - love the name! hahaha
    10/8/2013 13:5863.234.242.12YourVillageIdiotFor large displays I've used Atlas maps with square tiles referencing location based groups with parent/child dependencies. One display for location state. On other displays I've tiled (linear) gauges and filtered event views...
    10/8/2013 13:5863.234.242.12YourVillageIdiotthanks paul
    10/8/2013 13:592.27.145.98RhidianSRobertRight click on white space in chat window and 'Save the chat log' is best I can see
    10/8/2013 13:59209.22.221.73RobertRhidianS: Thanks!
    10/8/2013 13:5963.234.242.12YourVillageIdiotThe large tiles provided line of sight, red light = bad. Pop that view on your local display to drill in to find the device(s) in harms way.
    10/8/2013 14:00206.188.37.152scottmIs there a better way to set up the parent child dependencies beyond going in and setting them all up? I have 51 sites, and I haven't gotten a round tuit, yet, because it's so manual.
    10/8/2013 14: worries
    10/8/2013 14:00192.30.215.5ScottSadlochaYou can go into Manage Dependencies and set them up
    10/8/2013 14:01192.30.215.5ScottSadlochaWhat I did at my previous place was set up groups
    10/8/2013 14:01192.30.215.5ScottSadlochaAnd use those in the dependencies
    10/8/2013 14:0163.234.242.12YourVillageIdiotHaven't found a quick route for the dependencies, but creating the location groups based upon simple logic (subnet for example) provided the seeds.
    10/8/2013 14:02206.188.37.152scottmScottSadlochaAh, we have set up groups based on the location string. That might not be too bad.
    10/8/2013 14:0262.105.164.174AshleyCWe have a lot of xDSL connections being monitored on our platform and we have Worldmap resource on the display and helps rule out geographical issues.. I.e an exchange failure
    10/8/2013 14:02192.30.215.5ScottSadlochaWhat I did was this
    10/8/2013 14:02192.30.215.5ScottSadlochaWe had some sites with shop floor firewalls, some without
    10/8/2013 14:0263.234.242.12YourVillageIdiotWe have 230 sites at Patterson, so the dependencies required a little time to match them up but once done the parent/child membership remains dynamic.
    10/8/2013 14:02192.30.215.5ScottSadlochaI created a group for all internal nodes and used a query to define the group members
    10/8/2013 14:04192.30.215.5ScottSadlochaWe were also using HSRP at the sites, and I set up a Node for the HSRP
    10/8/2013 14:04192.30.215.5ScottSadlochaI then listed HSRP as the parent, and the group was the child
    10/8/2013 14:0462.105.164.174AshleyCScottSadlochaI will copy that
    10/8/2013 14:0462.105.164.174AshleyCScottSadlochagreat Idea as we use HSRP for failover solutions
    10/8/2013 14:05192.30.215.5ScottSadlochaThis way, the site had to be hard down to trigger the dependency. Switching to backup router did not trigger it
    10/8/2013 14:05192.30.215.5ScottSadlochaRight
    10/8/2013 14:06192.30.215.5ScottSadlochaI set it up so that the HSRP was a node in itself and was the parent. It worked well.
    10/8/2013 14:0662.105.164.174AshleyCScottSadlochanice
    10/8/2013 14:0662.105.164.174AshleyCScottSadlochaI like the thought behind itas I was also thinking along those lines
    10/8/2013 14:09206.188.37.152scottmHmm... I have groups for each site, but the node that would be the parent is in the group, and you can't do that. I may have to set up another dynamic group that excludes the parent node.
    10/8/2013 14:10
    10/8/2013 14:1163.234.242.12YourVillageIdiotwith the child and parent groups using a query, I filtered out the parent devices.
    10/8/2013 14:11192.30.215.5ScottSadlochaAnd Scottm, yes, you have to set up a group that excludes the parent node, otherwise you will get a message indicating that the node is part of the group
    10/8/2013 14:12192.30.215.5ScottSadlochaThat is the way I did it as well, YourVillageIdiot
    10/8/2013 14:1263.234.242.12YourVillageIdiotin our case of location system groups, filtered out device_type router (custom property).
    10/8/2013 14:1463.234.242.12YourVillageIdiotwhere we have primary and secondary routers at each site, both of those devices are in the parent router group.
    10/8/2013 14:15192.30.215.5ScottSadlochaWe had all nodes at a site using a site designator name, such as ZWCDE for Zwickau, Germany.
    10/8/2013 14:16192.30.215.5ScottSadlochaThen we included any device with "ZWCDE" but excluded ZWCDEHSRP
    10/8/2013 14: question on pollers, what is the time out on the Auto poller synchronisation?
    10/8/2013 14:16192.30.215.5ScottSadlochaOr for sites using a single router, exclude SiteNameRTR01
    10/8/2013 14:1963.234.242.12YourVillageIdiotSimilar here. Referenced our site-code scheme with city name to identify the groups. Where those properties are applied to each device it works well with incident management.
