1 2 3 Previous Next

Geek Speak

1,547 posts

Unlike most application support professionals, or even system administrators, as database professionals, you have the ability to look under the hood of nearly every application that you support. I know in my fifteen plus years of being a DBA, I have seen it all. I’ve seen bad practices, best practices, and worked with vendors who didn’t care that what the were doing was wrong, and others with whom I worked closely to improve the performance of their systems.

 

One of my favorite stories was an environmental monitoring application—I was working for a pharmaceutical company, and this was the first new system I helped implement there. The system was up for a week and performance had slowed to a crawl. After running some traces, I confirmed that there was a query without a where clause that was scanning 50,000 rows several times a minute. Mind you, this was many years ago, when my server had 1 GB of RAM—so this was a very expensive operation. The vendor was open to working together, and I helped them design a where clause, an indexing strategy, and a parameter change to better support the use of the index. We were able to quickly implement a patch and get the system moving quickly.

 

Microsoft takes a lot of grief from DBAs for their production systems like SharePoint and Dynamics, and some interesting design decisions that are made within. I don’t disagree—there are some really bad database designs. However, I’d like to give credit to whomever designed System Center Configuration Manager (SCCM)—this database has a very logical data model (it uses integer keys—what a concept!), and I was able to build a reporting system against it.

 

So what horror stories do you have about vendor databases? Or positives?

The terms EMS, NMS and OSS are often misunderstood and used interchangeably. This can, sometimes, lead to a confusion on which system can do what functions. Therefore I am attempting to clarify these terms in a simple way. This may help you in making informed decisions when procuring management systems.

 

But before understanding the terms for management systems, one should understand what FCAPS is, in relation to management systems. In fact every management system should perform FCAPS. The five alphabets in FCAPS stand for the following:

 

F- Fault Management – i.e.  Reading and reporting of faults in a network; for example link failure or node failure.

 

C-Configuration Management- Relates to loading/changing configuration on network elements and configuring services in network.

 

A-Account Management- Relates to collection of usage statistics for the purpose of billing.

 

P- Performance Management- Relates to reading performance related statistics, for example reading utilization, error rates, packet loss, and latency.

 

S-Security Management- Relates to controlling access to assets of network. This includes authentication, encryption and password management.

 

Ideally, any management system should do all of the FCAPS functions described above. However, some commercial solutions allow only some of the FCAPS functions. In that case, there will be a need for additional management system to do the rest of the FCAPS functions. The FCAPS applies to all types of management systems including EMS, NMS and OSS.

 

Now that we covered the general functions of management systems, let’s understand the terms, EMS, NMS and OSS.

 

EMS stands for “Element Management System”. It is also called Element manager. EMS can manage (i.e. FCAPS) a single node/element or a group of similar nodes. For example it can configure, read alarms etc. on a particular node or group of nodes.

 

NMS (Network Management System) on the other hand manages a complete network i.e. it covers all the functions of EMS as well as does FCAPS with relation to the communication between different devices.

 

So the difference between EMS and NMS is that NMS can understand the inter-relationship between individual devices, which EMS cannot. Although EMS can manage a group of devices of the same type but it treats all the devices in a group as single devices and does not recognize how individual devices interact with one another.

 

So to sum up:

  1. I.e. NMS = EMS + link/connectivity management of all devices+ FCAPS on network basis.

 

NMS can manage different types of network elements/technologies of a same vendor.

 

An example would clarify. An EMS would be able to give individual alarms on nodes. But NMS can correlate the alarms on different nodes; it can, thus find out root cause alarms when a service is disrupted. It can do so because it has network wide view and intelligence.

 

OSS (Operation Support Systems) takes a step further. It can not only manage a single vendor but can also manage multiple vendors. OSS will be needed in addition to vendor specific NMS. OSS will interact with individual NMSs and provide one dashboard for FCAPS management.

 

OSS, thus, can give a single view of the network end to end including all vendors. An example would be a service provisioning tool that can create an end to end service between Cisco and Juniper routers. This would need an OSS that can talk to the NMS of the both vendors for this purpose or can even configure the network elements directly.

 

After explaining the terms EMS, NMS and OSS, I would end up my blog by asking

 

  • Does your management system do all the FCAPS functions or some?

 

  • Do you prefer to have one network management system that does all FCAPS or different ones depending on different specialized functions?

 

Or may be I should ask, are you using any management system at all

 

Would love to hear your opinion!

Email has become the core of every organization. It is the primary source for communication that everybody turns to, regardless of the type of email server they are running. Who doesn’t have email??? When email is down, communication is down, resulting in lost productivity and potentially thousands to millions of lost dollars.

 

Microsoft Exchange Server is the most widely used on-premises enterprise email system today. When it works…it works great. But, when Exchange is not working correctly, it can be a nightmare to deal with.

 

For an Exchange administrator, dealing with Exchange issues can be challenging because there are so many variables when it comes to email. Having a user complain about email being “slow” can be the result of many different factors. It could be a network issue, a desktop client issue, or even a poorly performing Exchange Server. So many possibilities of what is wrong and 1 unhappy user. 

 

Not only do issues arise in everyday working situations, but if you are preparing for a migration or an Exchange upgrade there are always “gotchas.” These are things that get overlooked until something breaks and then everybody is scrambling around to fix it and do damage control later. These are probably the most annoying to me because often times somebody else has run into the problem first. So, if I had known about the gotchas I could’ve been prepared.

 

I recently had the opportunity to present a webinar, “The Top Troubleshooting Issues Exchange Admins Face and How to Tackle Them.One of the great things about the IT community is that it’s there for us to share our knowledge, so others can learn from our experiences. That’s what I hoped to share in this webinarsome tips on solving annoying problems, as well as providing some true tried lessons from managing Exchange myself. We discussed some of the challenges with Exchange migrations, mailbox management issues (client issues), and even discussed Office 365. You can view a recording of the Webinar here.

 

Since our time was limited, I could not answer all the questions that were asked during the webinar, so I wanted to take an opportunity to answer some of them here:

 

1. Is there a size limit on outlook 2010/exchange 2010? We had a laptop user with a 20GB mailbox with cache mode enabled who had an issue with his offline address book dying within a day of his cache being resynced - we saw it as an issue with him trying to view other users calendars.

 

This type of issue can be many things and could be a whole blog in itself so I will keep it short. Large mailboxes will create large OST files locally on the machine they are using and can become corrupt. If that is the case, then creating a new OST file may resolve your issue. When trying to view others calendars, you can try removing the calendar and re-adding the shared calendar again. Also double check calendar permissions!

 

2. Do you know the issue when installing EX2010 SP3 it fails at 'removing exchange files'? SP3 is needed for upgrading to EX2013.

 

There is a known issue for Exchange failing to remove setup files with SP3 when PowerShell script execution is defined in Group Policy. For more details on this issue use the Microsoft KB# 2810617 site.


3. Any other resources we should review for Exchange Best Practices &/or Monitoring Tips (Other than Thwack?)

 

The Microsoft TechNet site has Exchange Best Practices, as well as Monitoring tips that be helpful. There are also various Microsoft MVP sites that can be helpful as well, such as:

 

http://exchangeserverpro.com/

http://www.stevieg.org/

http://www.expta.com/

 

 

4. Any advice for having Exchange 2003 and Exchange 2010 coexist until migration is completed?

 

Coexistence periods can be a challenge to manage and is best if kept to a short period if possible. Microsoft provides some checklist and documentation that can help with coexistence which can be found here on their TechNet site.

 

5. Is it possible to add more than 10 shared mailboxes to outlook 2010 client?

 

Yes, it is possible to have more than 10 shared mailboxes. By default, Outlook 2010 & Outlook 2013 has a default limit of 10 mailboxes with a maximum supported of up to 9999 accounts. To configure outlook for more than the default limits, you will need to edit registry settings or apply a Group Policy.

 

6. Is there a way can we enable "On behalf of" for the shared mailbox, so the end user who receives the email knows who sends the email?

 

To enable send on behalf for a shared mailbox, you can configure delegates for the shared mailbox. You can also apply the send on behalf of setting under the mail flow settings in the mailbox account properties.

 

 

I had a great time participating in the webinar with Leon and hope our viewers were able to take some tips back with them to help within their Exchange environment. Exchange is the core of most businesses and Managing an Exchange environment can be challenging, I know that from personal experience. However, given the right tools by your side you can tame the beast and keep the nightmare events to a minimum.

 

Watching David Letterman sign off was a reminder that old shows are still great, but they are often replaced by new shows that resonate better with current times. It’s a vicious cycle driven by the nature of business. The same can be said of IT management with its constant struggle of old vs new management methods.

 

The old IT management principles rely on tried-and-true and trust-but-verify mantras. If it isn’t broke, don’t go breaking it. These processes are built on experience and born off IT feats of strengths. Old IT management collects, alerts, and visualizes the data streams. The decision-making and actions taken rest in the hands of IT pros and trust is earned over a long period of time.


Outdated is how new IT management characterizes the old management ways. Too slow, too restrictive, and too dumb for the onset of new technologies and processes. New IT management is all about the policies and analytics engine that remove the middle layer—IT pros. Decisions are automatically made with the analytics engine while remediation actions leverage automation and orchestration workflows. Ideally, these engines learn over time as well so they become self-sustaining and self-optimizing.


The driver for the new management techniques lie in the business needs. The business needs agility, availability, and scalability for their applications. Whether they are developing the application or consuming the application, the business wants these features to drive and deliver differentiated value. So applications are fundamental to the business initiatives and bottom line.


Where does your organization sit on the IT management curve – more old, less new or less old, more new or balanced? Stay tuned for Part 2 of 2015 IT Management Realities.

Look, almost all of us have been there you’re slogging through Monday morning after staying up late watching The Walking Dead and Game of Thrones.

 

That new hot shot application owner is guaranteeing that his new application solution is the best thing since sliced bread, and of course it’s more bulletproof than anything else you’ve never heard of.

 

As monitoring geeks we know deep down that we can help make that application everything someone else wants it to be. We’re going to monitor it, not just for uptime and utilization, but for application performance and reliability. We have the capability and responsibility to help the business deliver on those promises.

 

So, how can one even hope to perform this dark magic I’m suggesting?


Standing up for standards


Elementary! As monitoring geeks, we have a bevy of tools in the box; but just as important as those tools, we have standards. Standards that we adhere to, advocate, and answer to.


Setting Monitoring Implementation Standards for hardware and application platforms yields a standardized process tailored to each individual’s environment. This can afford a consistent and streamlined monitoring experience; even if the platforms and applications are diverse - the process can remain the same.


Eliminating the impossible


For starters, we’re going to eliminate some of the guess work by closely examining our scheduled discovery results; keeping an eye out for any wayward hardware platforms that require further inquiry and ensure they’re being attributed with the appropriate custom properties. Then start asking some critical questions to narrow our focus that may include:

 

  • What does the app do and who utilizes it?
  • What is it running on and where?
  • What OS does it require?
  • What database does it use?
  • What languages are running the application?
  • What processes and services are needed to make it function?
  • Does it have a Web portal and ports that need to be monitored?
  • Who needs to know when there is a problem?
  • What alerts are needed? Up and down? Do you want to know when specific components go into warning or critical states?

 

Consistently asking these standard questions at the onset of every monitoring activity will help build your customized standard, allowing you to target the unknowns quicker and start amassing data points.


As your monitoring system continues to amass those data points; riddle me this hero: What good is the data if we never look at it?

 

The devil is in the data

 

Scheduled data review is incredibly important for trend detections and data integrity. Start digging into that mountain of data you’ve been collecting with canned or custom reports then schedule them to be sent straight to your inbox. Review them weekly, monthly, and quarterly. You might be surprised at what you find (Or what you don’t!).

 

After consistently reviewing the information you will be able to start to sorting and collating that data into quantifiable metrics to show, for example, the ridiculous availability and uptime of the hot shot’s new application.

 

This charted data is now powerful business intelligence for decision makers when budgets get tighter, or just a good measurement for regulatory reporting.

 

By standardizing the appropriate level of hardware and application monitoring, scheduling automated reports, and reviewing the data you ensure the business’ applications and services are delivered reliably time and time again.

 

What monitoring standards do you utilize to consistently deliver applications and services to your constituents?


Career management is one of my favorite topics to write and or talk about, because I can directly help people. Something I notice as a consultant going into many organizations is that many IT professionals aren’t thinking proactively about their careers, especially those that work in support roles (supporting an underlying business, not directly contributing to revenue like a consulting firm or software development organization). One key thing to think about is how your job role fits into your organization—this is a cold hard ugly fact that took me a while to figure out.

 

Let’s use myself as an example—I was a DBA at a $5B/yr medical device company—that didn’t have tremendous dependencies on data or databases. The company needed someone in my slot—but frankly it did not matter how good they were at there job beyond a point. Any competent admin would have sufficed. I knew there was a pretty low ceiling of how far my salary and personal success could go at that company. So I moved to a very large cable company—they weren’t a technology company per se, but they were large enough organization that high level technologists roles were available—I got onto a cross platform architectural team that was treated really well.

 

I see a lot of tweets from folks that often seem frustrated in their regular jobs—the unemployment rate in database roles is exceedingly low—especially for folks like you who actively reading and staying on top of technology—don’t be scared to explore the job market, you might be pleasantly surprised.

Guys, I'm really excited to be part of the Thwack community.

 

My first post "THERE MUST BE A BETTER WAY TO MANAGE VIRTUALIZED SYSTEMS" reached more than 2550 people and my second post reached over 800 people plus a bunch of people who actively participated. This is great sign and shows that you all enjoy being part of this community.

 

In my previous posts we covered how you are managing your virtualized systems and what features are you using most.

 

In today's post, I would like to discuss one particular part of your virtual infrastructure, your Virtual Desktop Infrastructure (VDI). Common questions I come across are, how to right size my VDI infrastructure, how many IOPS will my users generate and should I utilize VMware View Thin Apps or Citrix XenApp. I found the VDI Calculator by Andre Leibovici very helpful.

 

Screen Shot 2015-05-18 at 9.37.38 PM.png

 

Additionally, I found LoginVSI to be great tool for VDI storage benchmarking and to find out how many VMs you can actually host on your system. I doubt that many of you are using it since it isn’t cheap but if you are using it or you have used it in the past, you know what I’m talking about. This tool fully simulates real VDI workloads and not just some artificial vdbench/sqlio/fio load. Also, VMware View Planner is supposed to be a great tool for benchmarking and right sizing your environment but I haven’t touched it just yet. Have you?

The last tool in my VDI repository is a tool created by the VMware Technical Marketing Group - VMware OS Optimization Tool. I am not going too much in detail here but just click on the VMware OS Optimization Tool and read my blog post about it. It is a great tool, which can be used to create your golden VDI image.

 

If you know some useful VDI tools or you used the tools, which I’ve mentioned above, please comment and share your experience with us. Let’s make this post a great resource for all VDI admins out there.

Fault Management (FM) and Performance management (PM) are two important elements of OAM in layer 2 and layer 3 networks.

 

FM covers faults management related to connectivity/communication of end stations.  While PM includes, monitoring the performance of link using statistics like packet loss, latency and delay variation (also called jitter) etc.

 

Here we need to differentiate between layer 2 and layer 3 networks.

 

For layer 2 networks, FM is usually done using CCM messages (connectivity check messages) while PM is done using standard protocols like 802.1ag or Y.1731 that can monitor all parameters mentioned above.

 

For layer 3 networks, Ping and trace route are primary tools for FM and by far the most widely used tools for troubleshooting, while IP SLA is one of the PM tools for Cisco devices. IP SLA can monitor all stats including loss, latency and delay variation at IP layer ( can also do it on layer 2) in addition to helpful stats for VOIP like MOS score. (Please note, Cisco use the term IP SLA also for both layer 3 and layer 2 links, even though the stats at layer 2 are on the Ethernet layer).

 

Coming from a carrier Ethernet background in my last job, when I look back, I can say that tools especially the PM tools at layer 2, were not used very often. It may be, because many people were not aware of it or the thresholds of pass/fail for performance measurements were not very well defined. Recently, Metro Ethernet Forum (MEF) has done a great job by standardizing the threshold and limits for jitter, delay and packet loss. Therefore, the PM tools have started gaining acceptance industry wide and are being rolled out in layer 2 service provider networks more actively.

 

However, I am quite curious on how often OAM tools are used in IP networks.

 

Fault management tools like Ping/traceroute are the bread and butter of an IP engineer when it comes to troubleshooting networks but I am especially interested to know more about the IP SLA and its use in the networks.

 

So my question to you would be

 

  • How often do you use IP SLA ( or any similar tool)  in your network? Do you use it in specific applications like VoIP?

 

  • Do you used it for both layer 2 and layer 3 networks. In enterprises as well as service provider environment?

 

  • Are the thresholds of the PMs (Delay, jitter and packet loss) well defined;  by Cisco or any standard body?

 

Would love to hear your opinion here!

cxi

Trust... But Verify

Posted by cxi May 18, 2015

Let me start by saying, wow and thank you to everyone who maintains such a high activity in this community. While I may occasionally share some jibber jabber with all of you, you are all the real champions of this community and I cannot thank you enough for your contributions, feedback and more!

 

This leads me to my segment this week... One I welcome your contributions as always...

 

Trust; But Verify!

 

Screen Shot 2015-05-17 at 3.55.00 PM.png

 

This line of thought isn't limited to Authentication, but it certainly shines as a major element of a trust model.

How many times are we put into a position of, "Oh yea, it's all good, no one has access to our systems without two factor authentication!" "What about service accounts?" "...crickets"

 

I've been there. My account to login to look at files, personal email, etc has such a high level of restraint and restriction that they have everything under the sun, username, password, secret pin, blood sample, DNA matrix...   Yet then the Admins themselves, either directly for elevated accounts, or indirectly through Service Accounts, or other credentials are 'secured' through simple password.   "Oh, we can't change the password on the account because it takes too long, so it goes unchanged for 60, 90, 180, never?"

 

Now, not every organization operates this way. I remember having tokens back in the 90s for authentication and connectivity for Unix systems, but that is truly few and far between.

 

I won't even go into the model whereby people 'verify' and 'validate' the individual who is hired to protect and operate in the network as that's VERY much outside the scope of this little blog, but it leaves to question... How far do we go?

 

What do you feel is an appropriate authentication strategy? One form (password), two form (Password+something else) or even a more complex intermixing of multiple methods?

And forget what we 'think' vs what you actually see implemented.

 

What do you prefer? Love, Hate, Other!



 

FeedMeSeymour.jpgDontFeedAnimals.pngFEED ME SEYMOUR! That's what I hear when anyone jumps to a conclusion that the database needs more CPU, memory, or faster disks.  Why? Because I'm a DBA who has seen this too many times. The database seems to be the bottleneck and there's no shortage of people suggesting more system resources. I usually caution them, "don't mistake being busy with being productive!" Are we sure the workload is cleverly organized? Are we applying smart leverage with proper indexing, partitioning and carefully designed access patterns or are we simply pouring data into a container, doing brute force heavy lifting and having concurrency issues?

 

Now if you're a SysAdmin, you might be thinking, this is DBA stuff, why do I care? The reason you should care is that I've seen too many times where resources were added and it only produced a bigger monster!

 

So for crying out loud, please don't feed the monsters.  THEY BITE!

 

To demonstrate my point, I'll tell you about an interesting discovery I explored with Tom LaRock AKA @SQLRockstar while creating demo data for the release of Database Performance Analyzer 9.2. I set out to do the opposite of my usual job. I set out to create problems instead of solving them.  It was fun!  :-)

LittleDBofHorrorsWaits.png

Latch_EX_Tip.JPG

My primary goal was to generate specific wait types for:

  1. Memory/CPU - Not a real wait type. We put this in the wait type field when you're working, rather than waiting because the only thing you're waiting on then is the CPU and Memory to complete whatever task the CPU was tasked with.
  2. ASYNC_NETWORK_IO - Ironically, this is seldom truly a network problem, but it could be and may be interesting to a SysAdmin.
  3. PAGEIOLATCH_XX - These are significant signs that you're waiting on storage.
  4. LCK_M_X - This is a locking wait type and locking can harm performance in ways that adding system resources can't help.

 

I knew that table scans cause all sorts of pressure, so I created 1 process that used explicit transactions to insert batches into a table in a loop while 4 processes ran SELECT queries in infinite loops. To maximize the pain, I ensured that they'd always force full table scans on the same table by using a LIKE comparison in the WHERE clause and comparing it to a string with wild cards. There's no index in the world that can help this! In each pass of their respective loops, they each wait a different amount of time between table scans. 1 second, 2 seconds, 3 seconds, and 4 seconds respectively. Three of the processes use the NOLOCK hint while one of them does not. This created a pattern of alternating conflicts for the database to resolve.


SignalWaitMystery.pngSo I got the wait types I targeted, but LATCH_EX just sort of happened. And I'm glad it did! Because I also noticed how many signal waits I’d generated and that the CPU was only at 50%. If *Signal Waits accounting for more than 20% our waits is cause for concern, and it is…then why does the server say the CPU is only around 50% utilized? I found very little online to explain this directly so I couldn't help myself. I dug in!


My first suspect was the LATCH_EX waits because I'd produced an abundance of them and they were the queries with the most wait time. But I wasn't sure why this would cause signal waits because having high signal waits is like having more customers calling in than staff to answer the phones. I really didn't have much running so I was puzzled. 

 

The theory I developed was that when SQL Server experiences significant LATCH_EX contention, it may require SQL Server to spawn additional threads to manage the overhead which may contribute toward signal waits. So I asked some colleagues with lots of SQL Server experience and connections with other experienced SQL Server pros. One of my colleagues had a contact deep within Microsoft that was able to say with confidence that my guess was wrong. Back to the guessing game…



With my first hypothesis dead on arrival, I turned back to Google to brush up on LATCH_EX. I found this Stack Exchange post, where the chosen correct answer stated that,

 

There are many reasons that can lead to exhaustion of worker threads :

  • Extensive long blocking chains causing SQL Server to run out of worker threads.
  • Extensive parallelism also leading to exhaustion of worker threads.
  • Extensive wait for any type of "lock" - spinlocks, latches. An orphaned spinlock is an example.


Well I didn't have any long blocking chains and I didn't see any CXPACKET waits. But I did see latches! So I developed hope that I wasn't crazy about this connection from latches to signal wait. I kept searching…

LatchClass.pngI found this sqlserverfaq.net linkIt provided the query I used to identify my latch wait class was ACCESS_METHODS_DATASET_PARENT.  It also broke down latches into 3 categories and identified that mine was, a non-buffer latch.  So I had a new keyword and a new search phrase, ACCESS_METHODS_DATASET_PAREN and "non-buffer latch".


SELECT  latch_class, wait_time_ms / 1000.0 AS [Wait In sec],

waiting_requests_count AS [Count of wait],

100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage

FROM sys.dm_os_latch_stats

WHERE latch_class NOT IN ('BUFFER')

AND wait_time_ms > 0

Then I found this MSDN postAbout half way in, the author writes this about ACCESS_METHODS_DATASET_PARENT: "Although CXPacket waits are perhaps not our main concern, knowing our main latch class is used to synchronize child dataset access to the parent dataset during parallel operations, we can see we are facing a parallelism issue".


SqlSkillsTweet.pngThen I found another blog post not only supporting the new theory, but also referencing a post by Paul Randal from SQLskills.com, one of the most reputable organizations regarding SQL Server Performance.  It states, "ACCESS_METHODS_DATASET_PARENT...This particular wait is created by parallelism...."


And for the icing on the cake, I found this tweet from SQLskills.com.  It may have been posted by Paul Randal himself.



So now I know that LATCH_EX shows when SQL Server parallelizes table scans.  So instead of one thread doing a table scan, I had several threads working together on each table scan.  So it started to make sense.  I had ruled out parallelization because I didn't see any CXPACKET waits, which many DBAs think of as THE parallelism wait.  And now THIS DBA (me) knows it's not the only parallelism wait!  #LearnSomethingNewEveryDay

 

So now I feel confident I can explain how an abundance of LATCH_EX waits can result in high CPU signal waits.  But I'm still left wondering why signal waits can be over 20% and the CPU is only showing 50% utilization.  I'd like to tell you that I have an answer, or even a theory, but for now, I have a couple of hypotheses.

 

  1. It may be similar to comparing bandwidth and latency.  It seems server CPU utilization is like bandwidth i.e. how much work can be done vs what is getting done, while signal waits is like latency i.e. how long does a piece of work wait before work begins.  Both contribute to throughput but in very different ways.  If this is true, then perhaps the CPU workload for a query with LATCH_EX is not so much work but rather time consuming and annoying.  Like answering kids in the back seat that continually ask, "are we there yet?"  Not hard.  Just annoying me and causing me to miss my exit.

  2. It may simply be that I had such little load on the server, that the little amount that was signal waits, accounted for a larger percentage of the work.  In other words, I may have had 8 threads experiencing signal wait at any time.  Not a lot, but 8 of 35 threads is over 20%.  So in other words, "these are not the droids I was looking for."

 

Maybe you have a hypothesis?  Or maybe you know that one or both of mine are wrong.  I welcome the discussion and I think other Thwack users would love to hear from you as well.


 

 

Related resources:

 

Article: Hardware or code? SQL Server Performance Examined — Most database performance issues result not from hardware constraint, but rather from poorly written queries and inefficiently designed indexes. In this article, database experts share their thoughts on the true cause of most database performance issues.

 

Whitepaper: Stop Throwing Hardware at SQL Server Performance — In this paper, Microsoft MVP Jason Strate and colleagues from Pragmatic Works discuss some ways to identify and improve performance problems without adding new CPUs, memory or storage.

 

Infographic: 8 Tips for Faster SQL Server Performance — Learn 8 things you can do to speed SQL Server performance without provisioning new hardware.

 


In a network, whether small or large, spread over one location or manythere are network administrators, system administrators, or network engineers who frequently access the IP address store. While many organizations still use spreadsheets, database programs, and other manual methods for IP address management, the same document/software is accessed and updated by multiple people. Network administrators take on the role of assigning IPs in small networks, as well as when they add new network devices or reconfigure existing ones.  The system administrator takes care of assigning IPs to new users that join the network and adding new devices like printers, servers, VMs, DHCP & DNS services, etc. Larger networks that are spread over multiple locations sometimes have a dedicated person assigned to specifically manage planning, provisioning and allocation of IP space for the organization. They also take care of research, design and deployment of IPv6 in the network. Delegating IP management tasks to specific groups’ based on expertise or operations (network & systems team) allows teams to work independent of each other and meet IP requirements faster.

 

Again, if the central IP address repository is maintained by a single person, then the problem lies in the delay of meeting these IP address requests. Furthermore, they could run into human-errors and grievances stemming from teams experiencing downtime -- waiting to complete their tasks.

 

What Could Go Wrong When Multiple Users Access the Same Spreadsheet?


Spreadsheets are an easily available and less-expensive option to maintain IP address data. But, it does come with its own downsides when multiple users access the same spreadsheet. Typically, users tend to save a copy to their local drive and then finding the most recently updated version becomes another task! You end up with multiple worksheets with different data on each of them. There is no way to track who changed what. Ultimately, this leads to no accountability for misassignments or IP changes made.

 

In short, this method is bound to have errors, obsolete data and lacks security controls. There could be situations when an administrator makes a change in the status of an IP address, but forgets to communicate the same to the team/person that handles DHCP or DNS services. In turn, chances are higher that duplicate IP addresses are assigned to a large group of users causing IP conflicts and downtime.

 

With all that said, the questions that remain are: Can organizations afford the network downtime? And are the dollars saved from not investing in a good IP address management solution more than those lost due to loss in productivity? This post discusses the problems of using manual methods for IP address management. In my next blog we  look at associated issues and the best practices of roles and permissions enabling task delegation across teams.

 

Do you face similar difficulties with your IP administration? If yes, how are you tackling them?

vinod.mohan

Doing IT Remotely!

Posted by vinod.mohan May 14, 2015

Often, as organizations grow and expand, it can make the job harder for IT teams. The IT infrastructure may become larger and more complicatedbe distributed across various sites and locations. For example, end-users to support could be onsite, offsite, or even on the road travelling. There may not be enough admins in all locations, and the need for remote IT management becomes essential.

  

Even in smaller businesses and start-ups where office space and IT infrastructure is not quite ready yet, and employees are telecommuting from home and elsewhere, the need for remote IT surfaces. A single IT pro wearing a dozen different IT hats will have to make do and support end-users wherever they may be.

  

Remote IT is generally defined in different ways by solution providers based on the solution they offer. In this blog, I am attempting to cover as many scenarios as possible that could be called remote IT.

 

SO, WHAT IS REMOTE IT?

  • IT pros in one location managing the infrastructure (network, systems, security, etc.) in remote location
  • IT  pros in one location supporting end-users in a remote location
  • IT pros within the network supporting end-users outside the network
  • IT pros monitoring and troubleshooting infrastructure issues while on the go, on vacation, or after office hours
  • Monitoring the health of remote servers, applications, and infrastructure on the Cloud
  • Remote monitoring and management (RMM) used by IT service providers to manage the IT infrastructure of their clients
  • User experience monitoring of websites and web applications—both real user monitoring and synthetic user monitoring
  • Site-to-site WAN monitoring to track the performance of devices from the perspective of remote locations
  • Certain organizations have their mobile device management (MDM) policies that include remote wiping of data on lost or stolen BYOD devices containing confidential corporate information

 

This may not be a comprehensive list. Please do add, in the comments below, what else you think fits in the realm of remote IT.

 

But the primary need for remote IT is that, without having to physically visit in person a remote site or user, we have to make IT work—monitor performance, diagnose faults, troubleshoot issues, support end-users, etc. And, this should be done in a way that is cost-effective and result-effective to the business.

 

Just like how we need a phone or computer (a tool, basically) to communicate with a person situated remotely, to make remote IT work, it comes down to using remote IT tools. When you’ve  equipped with the right tools and gear to manage IT remotely, you will gain greater control and simplicity to work your IT mojo wherever the IT infrastructure is, the user is, or you—the IT pro—are.

 

Also, share with us what tools you use for doing IT remotely.

In my last post "THERE MUST BE A BETTER WAY TO MANAGE VIRTUALIZED SYSTEMS", we talked about what systems are out there and which ones everyone is using. Ecklerwr1, posted a nice chart from VMare which compares VMware vRealize Operations to SolarWinds Virtualization Manager and a few others.


vmw-scrnsht-virtualization-practice_lg.jpg

 

Based on the discussion, it seems like many people are using some kind of software to get things sorted in their virtual environment. In my previous job, I was responsible for parts of the lab infrastructure. We hosted 100+ VMs for customer support, so our employees can reproduce customer issues  or use it for training.

 

While managing the lab and making sure we have always enough resources available, I found it difficult to identify which VMs have actively been used and which VMs were idle for some time. Another day-to-day activity was to hunt down snapshots which consumed an massive amount of space.

Back then, we wrote some vSphere CLI scripts to get the job done. Not really efficiently but done. However, using SolarWind's Virtuailzation Manger now, I see how easy my life could have been.

 

My favorite features are the ability to view idle VMs and monitor the VM snapshots disk usage. Both features could have saved me lots of hours in my previous job. 

I am curious to know what features are saving you on a regular basis? Or are there any features, which we are all missing but just don’t know it yet?As Jfrazier mentioned, maybe Virtual Reality Glasses?

If you are an Oracle DBA and reading this, I am assuming all of your instances run on *nix and you are a shell scripting ninja. For my good friends in the SQL Server community, if you haven’t gotten up to speed on PowerShell, you really need to this time. Last week, Microsoft introduced the latest version of Windows Server 2016, and it does not come with a GUI. Not like, click one thing and you get a GUI, more like run through a complex set of steps on each server and you eventually get a graphical interface. Additionally, Microsoft has introduced an extremely minimal server OS called Windows Nano, that will be ideal for high performing workloads that want to minimize OS resources.

 

One other thing to consider is automation and cloud computing—if you live in a Microsoft shop this all done through PowerShell, or maybe DOS (yes, some of us still use DOS for certain tasks).  So my question for you is how are you learning scripting? In a smaller shop the opportunities can be limited—I highly recommend the Scripting Guy’s blog. Also, doing small local operating system tasks via the command line is a great way to get started.

I was watching a recent webcast titled, “Protecting AD Domain Admins with Logon Restrictions and Windows Security Log” with Randy Franklin Smith where he talked (and demonstrated) at length techniques for protecting and keeping an eye on admin credential usage. As he rightfully pointed out, no matter how many policies and compensating controls you put into place, at some point you really are trusting your fellow IT admins to do their job—but not more—with the level of access we grant and entrust in them.

 

However, there’s a huge catch 22—as an IT admin I want to know you trust me to do my job, but I also have a level of access that could really do some damage (like the San Francisco admin that changed critical  device passwords before he left). On top of that, tools that help me and my fellow admins do my job can be turned into tools that help attackers access my network, like the jump box in Randy’s example from the webcast.

 

Now that I’ve got you all paranoid about your fellow admins (which is part of my job responsibilities as a security person), let’s talk techniques. The name of the game is: “trust, but verify.”

 

  1. Separation of duties: a classic technique which really sets you up for success down the road. Use dedicated domain admin/root access accounts separate from your normal everyday logon. In addition, use jump boxes and portals rather than flat out providing remote access to sensitive resources.
  2. Change management: our recent survey of federal IT admins showed that the more senior you are, the more you crave change management. Use maintenance windows, create and enforce change approval processes, and leave a “paper” trail of what’s changing.
  3. Monitor, monitor, monitor: here’s your opportunity to “verify.” You’ve got event and system logs, use them! Watch for potential misuse of your separation of duties (accidental OR malicious), unexpected access to your privileged accounts, maintenance outside of expected windows, and changes performed that don’t follow procedure.

 

The age old battle of security vs. ease-of-use wages on, but in the real world, it’s crucial to find a middle ground that helps us get our jobs done, but still respects the risks at hand.

 

How do you handle the challenge of dealing with admin privileges in your environment?

 

Recommended Resources

 

REVIEW - UltimateWindowsSecurity Review of Log & Event Manager by Randy Franklin Smith -

 

VIDEO – Actively Defending Your Network with SolarWinds Log & Event Manager

 

RECOMMENDED DOWNLOAD – Log & Event Manager

Filter Blog

By date:
By tag: