1 2 3 Previous Next

Monitoring Central

64 posts

Over the last decade, cybercriminals have gained the necessary resources to make it easier and more lucrative for them to attack small-to-medium-sized businesses. The 2019 Cost of a Data Breach Report not only shows the odds of experiencing a data breach have gone up by a third in less than a decade, but the cost of these data breaches is also on the rise. Additionally, small businesses face disproportionately larger costs than their enterprise counterparts when an attack is successful. This report highlights the importance of SMBs being prepared, now more than ever, to quickly identify and respond to potential cyberattacks.

 

One common way businesses increase their security posture is by implementing, and using, a Security Information and Event Management tool—SIEM for short. A SIEM solution at its core, aggregates and normalizes log and event data from across an entire network making it easier to identify and respond to attacks, compromised data, and security threats.

 

However, many SMBs feel a SIEM solution is out of reach for their organizations for three main reasons:

 

  1. Complexity
    The complexity starts right away with most traditional SIEM vendors. Connecting different log sources often requires building parsers or writing (and possibly learning) RegEx to ingest and normalize log data. Once the data has been consolidated, recalling the data adds another layer of complexity. For example, wanting to see logins from a particular user can require writing a query in language created specifically for their SIEM. Additionally, feature bloat often makes it difficult to know how to find answers to simple questions.

  2. Expertise Requirements
    A SIEM is only as effective as the rules put in place to identify, alert on, and respond to potential threats. Without a deep understanding of the types of activities captured by logs, and the behaviors indicating malicious or risky behaviors, setting up the rules can be daunting. Especially if the SIEM doesn’t come with any pre-built rules. With limited time, and a scarcity of available security professionals, setting up a SIEM can seem like too big of a project to take on

  3. Expense
    Aggregating all log and event data in one place is ideal. However, the licensing models of many SIEM solutions can quickly price out SMBs. Many of the most common SIEM solutions on the market are SaaS products. The price changes based on log volume being sent to the product. This leads to two main problems, pricing being unpredictable and/or IT pros needing to cherry pick which logs they will collect and store…hope you pick the right ones.

 

At SolarWinds we understand how important it is for IT pros at SMBs to gain valuable time back and automate as much as possible—including threat detection and response. That’s why we built Security Event Manager (SEM). It’s a SIEM solution built for resource-constrained IT pros needing to advance their organization’s security beyond patching, backups, and firewall configurations. SEM is designed to provide the most essential functions of a SIEM to help improve security posture, more easily meet compliance requirements, and reduce the time and complexity of an audit.

 

How Does SolarWinds Security Event Manager Differ From Other SIEM Products?

  1. Easy to Deploy and Use
    Deployment is flexible via virtual appliance potentially located on-premises or in the public cloud (such as Azure or AWS). Many users report SEM is up and running within fifteen minutes, no professional services required. Log collection and normalization is done by either enabling one or more of the hundreds of pre-built connectors and sending logs to SEM or by deploying the SEM agent.

    It has a simple and clean UI, focused on the features SMBs find most important. Such as the dashboard to help visualize important trends and patterns in log and event data:

    As well as a quick and easy keyword search providing faster log recall without the need to learn specialized query languages:


  2. Provides Expertise and Value Out of the Box
    Finding value with the tool will not be an issue. An integrated threat intelligence feed and hundreds of pre-defined filters, rules, and responses, not only make it faster and easier for users to identify threats, but also automate notifications or corrective actions.

    Beyond identifying and responding to threats, the pre-built reports make demonstrating compliance a breeze.

    The best part is users aren’t confined to out-of-the-box content. As their organizations needs change and grow, or as they become even better acquainted with the tool, the pre-defined content, visualizations, and reports are flexible.

  3. Priced With SMBs in Mind
    SolarWinds® Security Event Manager has a simple licensing model. SEM is licensed by the number of log-emitting sources sent to the tool. No need to pick and choose which logs to send, and no need to worry about a large influx of logs breaking your budget. Users get all the features of SEM and industry leading support for a single price. The pricing model is built to scale with the user’s environment, the price per node dropping at higher tiers. For those looking to monitor workstations, infrastructure, and applications, special discounted pricing is available. Same deal, one price for all features, for each workstation.

 

If you’re an IT pro at an SMB looking to get a better handle on cyber security or compliance reporting, give SEM a shot. You can download a free, 30-day trial here.

Background

 

This blog initially started out as an examination of how SolarWinds uses Database Performance Analyzer (DPA) within our own production environment. It now includes not only how our DBA uses DPA, but how other business units within SolarWinds use it and why. It isn’t surprising to find people in IT operations and application development using DPA, since our own customer studies have shown a high number of people outside of the DBA role use it, too.

 

Recent product-specific studies for DPA showed a high number of DevOps/IT Ops and AppDev roles using the product and an eye-opening, broad customer census exposed even more. In the 2019 THWACK Member Census, we asked over 2,200 IT professionals to select their primary job role and only 2.4% selected DBA. Interestingly, when we asked respondents if they managed or monitored databases, 42.7% said yes.

 

This brings to light the discussion of the “accidental DBA” and some interesting changes in IT organizations. First is the growth in number of DevOps people who handle database-related tasks. Second is the importance of databases as the platform for most mission-critical applications and why everyone has a keen interest in their availability and performance. And last, but not least, the number of DBAs is going down according to Computer Economics, who has seen the percentage of DBAs relative to total IT staff drop to 2.8% in 2017 from 3.3% in 2013. Our own Head Geek, Thomas LaRock, wrote an article pointing out the number of DBA jobs has stagnated for almost 20 years. On the flipside, Gartner pointed out that DBMS (Database Management Systems) revenue grew an astounding 18.4% to $46 billion in 2018.

 

Armed with this information, I decided I’d investigate the SolarWinds DBA team and see if any of these trends held true.

 

Let’s Start With the DBA

 

As I mentioned, I initially thought I’d interview the DBA team here at SolarWinds to see how we “drink our own champagne,” since I knew DPA was used by our internal IT team. As it turns out, the “DBA Team” is one person. I guess for a company that did $833 million in revenue in 2018 I expected an entire DBA organization, not just one hardworking DBA. But maybe this isn’t the exception?

 

I learned a lot from our DBA about how she can keep track of over 250 Microsoft SQL Server databases running on a mix of physical and virtual machines. My biggest takeaway from talking to her was that DBA’s don’t “monitor databases.” They want to be alerted when there are problems and they need a product to help them quickly find and resolve problems when they arise. They also want a product to help them optimize their databases proactively.

 

The first thing we discussed was “what’s important and who is it important to?” Here are the top things SolarWinds uses DPA for and the primary users:

 

  • - Overall database health: DBA and IT Ops
  • - Debugging after deployment: AppDev and DBA
  • - Ad-hoc trouble shooting: DBA and AppDev
  • - Capacity planning: DBA

After I learned about the overall database environment (250+ SQL Server databases), I wanted to understand specific, real-world use cases of DPA in action.

 

DBA Usage Scenarios

 

So how does the DBA at SolarWinds use DPA? First, she sets up alerts, so she can immediately be sent text notifications from DPA if something goes awry. DPA has had alert notification for a while, but the 2019.4 release made it even easier via a “drag and drop” interface, making alert customization simple. Second, DPA is the first place she goes to when she gets notified about something going wrong, whether it’s an alert, phone call, email, or a help desk ticket opened and assigned to her.

 

Scenario 1 of 2

 

In this first real-life scenario, our DBA was alerted to an “assertion check fail” pointing to possible corruption. The SQL Server instance itself created a hard-to-decipher stack dump and the only noticeable thing she could pick out of was the process ID.

With this in hand, she went into DPA to the specific time the event occurred in the SQL Server instance. Since DPA provides both real-time and historical data, she was able to drill down to find 1) the session ID executing this query, and 2) the SQL script running and the database. After speaking with the developer who ran the query, she determined it was a problem with SQL Server itself and asked the developer to refrain from running the query until they got the problem resolved by Microsoft.

 

*Screenshot the SolarWinds DBA used to find the culprit of the stack dump SQL Server generated.

 

Scenario 2 of 2

 

This second use case brings to light how important DPA is for establishing the overall health of a database and for capacity planning. Our DBA could not stress enough how important it was for her to know the baseline of a database instance and associated queries. From the baselines DPA develops, with the help of machine learning, she can know what a typical day looks like and the behavior of typical database activity. This allows her to spot both anomalies and trends.

 

Regarding capacity planning, she uses DPA to monitor the utilization and performance of applications and make note of trends she uses for future capacity requirements such as new or additional servers. Luckily, SolarWinds does a quarterly two-week freeze on new applications and changes, and this two-week period gives her a chance to go through DPA reports and proactively tune the environment. DPA’s anomaly detection powered by machine learning is a great way to graphically see the biggest opportunities for proactive optimization.

 

*This resource tab in DPA is a favorite of our DBA because it gives her a good overview of server resources being used.

 

Our DBA believes DPA will be even more useful as SolarWinds starts to migrate databases to Azure PaaS. As she stated, being on top of performance issues like poorly written SQL and poor performing tables doesn’t go away, and the cost of making mistakes, especially those consuming resources, can lead to spikes in usage charges.

 

Application Development and DPA

 

As I mentioned at the beginning, I learned a lot about how DPA is used at SolarWinds and the various people and departments using it. The application development (AppDev) team is one of the bigger teams in need of the data DPA provides. Why? Because they, along with our DBA, are constantly deploying changes and want to see the difference.

 

For example, is the SQL query running slower or faster than before? As previously mentioned, some people are “accidental DBAs,” so if the query they implemented ran fine on a QA instance but in production performs poorly, they need to know why. Case in point, this exact scenario happened recently and was due to a missing index DPA quickly pointed out. As our DBA stressed, for someone not very experienced with index recommendations, the tuning advisors in DPA can be a life-saver.

 

*One of the most popular DPA pages used for before and after is also the one used to look at overall waits and is great for seeing changes in before and after performance.

 

Finally, IT Operations

 

At SolarWinds, IT Operations (IT Ops) is where the buck stops for overall system availably, and just like our DBA, they make extensive use of alerts. Depending on the alert, they may send a priority 3 email when something has reached a certain threshold. But if SQL Server were down, they would send an email as well as page Opsgenie, which then goes to the primary person on call and posts a message on Microsoft Teams. The IT Ops group also has certain alerts integrated with SolarWinds® Service Desk to automatically open tickets.

 

But what about databases and their health…does IT Ops care? The answer is yes because they rely on the DPA integration with SolarWinds Server & Application Monitor (SAM) to find the cause of performance issues on servers or when someone complains about application performance. Since DPA and SAM integrate with the SolarWinds Orion® Platform, you can navigate seamlessly between the products.

 

For example, they used the SAM integration to track a CPU spike on a server to a SQL Server database instance in a critical state. In this case, they immediately reached out to the SolarWinds DBA because they could tell the issue with the server was related to the database. However, if the DBA is unavailable, they rely on the suggestions and recommendations in DPA to diagnose the problem and take action or provide further documentation for either our DBA or AppDev.

 

Just as DBA and AppDev look for signs of abnormality, IT Ops looks at historical trends to find issues that may correlate to database issues. The integration of SAM and DPA makes this simple.

 

*IT Ops uses this page in Server & Application Monitor to see trends and then drill down and isolate the root cause. SAM’s integration with DPA makes this simple.

 

Summary

 

As stated in the introduction, the role of the DBA is changing and many people without a DBA title are involved with the performance of database applications. With the movement of database instances to IaaS and PaaS implementations, the ability to optimize, find, and resolve performance issues doesn’t go away. In some ways it becomes more important due to the potential impact on OpEx (aka your monthly Azure bill).

Change control. In theory it works. However, there’s always one person who thinks the process doesn’t apply to them. Their justification for going rogue may sound something like, “There’s no time to wait, this has to be done now,” and, “This is a small change, it won’t impact anything else,” or maybe, “This change will make things better.”

 

But at the end of the day, those changes inevitably end up crashing a service, slowing application performance, or even worse, opening new vulnerabilities. The call will come in, something’s broken and magically no one will know why on earth it’s happening and, they certainly won’t be able to remember if any changes occurred…or who made a change. There goes the rest of your day, looking for the root cause of an issue created by one of your own coworkers.

 

Recently, Head Geeks Thomas LaRock sqlrockstar and Leon Adato adatole hosted a THWACKcamp session on this exact topic. In their scenario the culprit was “Brad the DBA.” At SolarWinds, we understand this all-too-common scenario and have a tool designed to help.

 

SolarWinds® Server Configuration Monitor (SCM) provides an easy-to-use and affordable way to track when server or application configuration changes are being made, who’s making the changes, and what the differences are between the old configuration and the new configuration. It detects, tracks, and alerts on changes to things like hardware, software, operating systems, text and binary files, Window Registry, and script outputs on Windows® and Linux® servers.

 

Additionally, SCM is an Orion Platform-based module, meaning you can quickly correlate configuration changes with infrastructure and application performance metrics in a single view. Helping confirm or illuminate the possibility of a configuration change being the culprit.

 

These capabilities help provide you with the visibility needed to not only remediate issues faster but, also hold non-process-abiding team members accountable for their actions. If you’re tired of the shenanigans created by your colleagues not following the change control process for your servers and applications, check out a free, 30-day trial of Server Configuration Monitor. And just for fun, if you have a good story of how “Brad” broke your day, feel free to share below!

PASS Summit 2019 is here and SolarWinds will be at the conference, booth #416, November 5 – 8 in Seattle, Washington.

 

We’ll be showcasing the latest release of SolarWinds® Database Performance Analyzer (DPA), including our just announced support for Azure® SQL Managed Database Instance and SQL Server® 2019. No matter if you’re currently using DPA for your cross-platform database performance monitoring, or if you’re a “casual DBA,” stop by our booth for a demo to see the great new features we’ve added to this release.

 

And if you’re currently using any other SolarWinds products capable of integrating with the Orion® Platform, such as Server and Application Manager (SAM) or Virtualization Manager (VMAN), ask us how DPA seamlessly integrates with other products to give you a complete end-to-end view of your database applications.

 

Lastly, we’ve got two SolarWinds-sponsored events at the conference you should put on your calendar. One is the first timer’s reception we’re sponsoring Tuesday, November 5, from 4:45 – 6 p.m. in ballroom 6E at the convention center. The second is our presentation “SQL Server Performance Analysis Powered by Machine Learning,” Wednesday, November 6, at 1:30 p.m. in room 618 at the convention center.

 

Stop by and say hi. In addition to product demos, we’ll be giving away some cool swag.

After all the education, all the measures, there are two almost inevitable truths in IT security: Some users will reuse their passwords and data breaches will happen—leading to exposed and misused credentials.

 

These two facts combined spell trouble. Account takeover kind of trouble.

 

Your own password strategy is likely solid. You use strong, unique passwords, enabled multi-factor authentication whenever you could—modern-day table stakes, right?

 

Do you think everyone in your organization is as careful? You never know whether their Outlook or Salesforce password isn’t the same one they used to sign up for a Smash Mouth fan forum running on vBulletin, last updated in 2011.

 

We’re not even talking about your organization when it comes to data breaches—the IT pros do their best to ensure breaches won’t happen. But that vBulletin forum is easy pickings. Once breached, the hacker has half of what they need—the password. Even serious services and organizations are regularly breached and pilfered.

 

Mix these two together, add some credential stuffing tools, and you have an account takeover—using legitimate, but stolen, credentials to access your systems and data.

 

What to do about it? Other than to think “We’re not an interesting-enough target” (false) or it’s not something to worry about (false).

 

We’re excited to have a new tool to help you get some power back to your hands: SolarWinds® Identity Monitor. We’ve partnered up with security experts at SpyCloud to give you a chance to act before the bad actors do.

 

In true SolarWinds fashion, Identity Monitor is an easy-to-use product:

You sign up with your business email address, add any other company email domains to a watchlist and then get notified whenever any of the emails and credentials belonging to your organization have appeared in a data breach. You also have access extensive past breach history for your monitored domains.

 

With an early notification, you can act and protect your accounts from account takeover attempts, for example forcing a password reset.

 

Ready to see where you stand? Start with a free check of your email breach exposure and see the stats of how many personal records in your domain were exposed and when. Fingers crossed!

It sounds easy enough to an IT pro: is it an applications ticket or a hardware ticket?

 

Simple enough. Why is this one question so important? For starters, it’s how you track the performance and success of the teams who provide internal support. In addition, collecting simple data points like “category” and “subcategory” can drive a better, faster experience for all the employees in your organization.

 

The problem is, the sales manager (or accountant, or creative designer) doesn’t know the difference between an application support issue and hardware breakdown, and they’re not familiar with IT’s processes that rely on such information.

 

That’s where artificial intelligence (AI) can help. SolarWinds® Service Desk uses AI in a few different ways– suggested data points for tickets was the first AI-powered functionality we introduced. Suggested categories and subcategories provide users some direction based on keywords within the subject or description of their ticket and the composite history of ticket data in the system. The goal is for requesters, regardless of their tech understanding, to enter complete and accurate ticket data thanks to those suggestions.

 

This data can drive automated ticket routing and priority. It can empower your teams to carve out unique SLAs for specific types of tickets (and trust the tickets are categorized correctly). It can make the difference between granular, accurate performance data and, well, this:

 

This might look familiar: the dreaded “other” category. When users (or agents, for that matter) don’t know how to identify the category, your reports will look something like this. It’s time to say goodbye to this useless chart. AI will see it to the door by suggesting the correct data points up front.

 

Let’s look at some use cases for AI in action.

 

Powering Ticket Automations

One of the most important sections in the configuration of a SolarWinds Service Desk environment is the automations engine. This where you’ll identify types of tickets you can route directly to a certain group, keywords indicating high priority, or breaches requiring specific escalation processes.

 

Those automated actions depend on data collection when a ticket is entered. The information the user enters will correspond directly with an action, so it needs to be correct for an automation rule to work.

 

This is where AI can help. As you can see from the next example, there are suggested categories and subcategories as soon as they click those required drop-downs. The suggestions are based on the information they’ve already entered combined with historical data from the service desk environment. When they choose “hardware” and “laptop” with the help of those AI-powered suggestions, the custom fields appear for “battery type” and “type of laptop.”

 

 


 

 

Why is this important?

 

You can create an automation rule to route these tickets directly to the appropriate support group. The AI-powered suggestion unlocks those custom fields to help you pinpoint the exact nature of the issue.

 

In the example below, you’ll see an automation rule to route Mac device issues directly to the “Mac Device Technical Support” group.

 

 

 

 

 

With the help of AI-powered suggestions, you’ll receive the crucial piece of information driving the automation rule. Now these tickets will skip the general queue and arrive instantly with the “Mac Device Technical Support” group. You’ve saved the requester time waiting for a resolution and your IT team time parsing through a general ticket queue.

 

Self-Service and Suggested Request Forms

 

Requesters may not realize the benefit of the suggested categories because they’re unfamiliar with how your teams use the data. But in this next example of AI in the service desk, the benefits will be plainly evident.

 

This is where your service catalog and knowledge base reach their maximum potential. For a long time, it was very difficult for IT to encourage users to leverage self-service articles or request forms driving workflows. Simply put, some users will always default to creating a ticket, no matter what resources might be available through the portal.

 

When IT replies to the ticket with a link to a request form, the user now needs to complete the form after already submitting a ticket. That’s a poor experience, and it’s time wasted on both ends.

 

To simplify the experience, you can meet them with suggested resources wherever they are in the portal. If they like to use the search bar, suggested service catalog forms and self-service articles will appear. If a requester is the “always submit a ticket” type, AI-powered suggestions will pop up with request forms or knowledge articles as they fill out the subject line of the ticket.

 

Not only are you anticipating the service they need, but you’re giving them every single opportunity to leverage the resources your team has made available.

 

 

 

 

 

So, for now, there are three major benefits AI has brought to the service desk:

1) Complete and accurate data collection to drive automation and reporting

2) Access to appropriate request forms, driving automated workflows

3) Opportunities to self-resolve

 

As this technology grows, so will the possibilities for proactive measures to save time and avoid disruption to employees who depend on the technology you support.

In the past, the importance of access rights management had to wait in line behind trending topics like hybrid infrastructures, digitalization, cloud, and the latest new tools the C-level wants to have and implement. As a result, access rights management in companies often lacks transparency, is organically grown, and doesn’t follow best practices like the principle of least privilege.

Even though managing user access rights is an essential part of every administrator’s work, there are different ways of doing it. However, looking at all the systems, tools, and scripts out there, most admins share the same big pain points.

Earlier this year, we asked our THWACK® community about their biggest pain points when it comes to access rights management and auditing. Turn out the biggest factors are:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  1. Moving, adding, or changing permissions
  2. Running an audit/proving compliance
  3. Understanding recursive group memberships

 

1.      Moving, Adding, or Changing Permissions

 

The flexibility of today’s working world requires a well thought-out user provisioning process. Whether for a new user, a short-term assignment, department changes, or temporary projects, the expectations of an IT group are to accurately and quickly provision users while helping to maintain data security.

IT departments are typically responsible for securing a network, managing access to resources, and keeping an overview of permissions and access rights policies. Therefore, they should use a provisioning framework. SolarWinds® Access Rights Manager (ARM) is designed to help address the user provisioning process across three phases—joiners, movers, and leavers.

 

SolarWinds Access Rights Manager not only helps automate the joiner or initial provisioning phase, it also allows admins to quickly perform changes and remediate access rights while enabling data owners.

 

Creating and Moving User Access Permissions

 

With ARM, you can control the creation of new user accounts, rights management, and account details editing.

Its user provisioning tool allows you to set up new users typically within seconds. Users are generated in a standardized manner and in conformity with the roles in your company. The access rights to file servers, SharePoint sites, and Exchange as defined in the AD groups are issued at the same time. ARM generates a suitable email account so the new colleague can start work immediately. You can schedule the activation to prepare for the event in the future or to limit the access period for project work. Whether help desk or data owner, participants work with a reduced, simple interface in both cases.

All access rights are set up in a few steps.

 

On the start screen under “User Provisioning,” you can choose from the most important quick links for:

 

  • Creating a user or a group
  • Editing group memberships
  • Editing access rights for resources

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

By choosing “Create new user or group,” ARM allows you to create a user or group based on preset templates. These user and group templates have to be created individually one time after installing ARM.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

For further information please download our whitepaper: Joiner, Mover, Leaver: User Provisioning With SolarWinds Access Rights Manager

 

2.      Running an Audit/Proving Compliance

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

With ARM, you can either create reports on users/groups or resources along with further filters.

Just looking at reports for Active Directory, you could create views for:

 

  • Where user and groups have access
  • Employees of manager
  • Display user account details
  • Find inactive accounts
  • OU members and group memberships
  • User and group report
  • Identify local accounts
  • And many more

 

While creating a report, you can set different selections such as the users or groups you’d like to report on and the resources you would like details about.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Additionally, you can set up scheduled reports, which you can send directly as email to yourself, your auditor, or direct line if needed.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

To gain more insight on the reporting capabilities of ARM, please see our whitepaper: Top 7 Audit-Prep Reports

 

3.      Understanding Recursive Group Memberships

 

Groups can be members of other groups. Active Directory allows "children" to become "parents" within their own family tree. If the nested group structure loops in a circular way, group membership assignments become ineffective and nonsensical. Through these recursions or circular nested groups, every user who is a member of any of the recursive groups is granted all the access rights of all the groups. The consequence is a confusing mess of excessive access rights.

ARM automatically identifies all recursions in your system. We highly recommend removing the recursion by breaking the chain of circular group memberships.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ARM not only allows you to see circular or recursive groups, but directly correct group memberships and dissolve recursions.

To keep an eye on the most common access-based risk levels, ARM provides a risk assessment dashboard with the eight biggest risk factors and lets you correct your individual risk levels right away.

 

Get your free ARM trial and do your risk assessment here.

We are happy to announce the arrival of the brand-new SolarWinds NAT Lookup free tool.

 

We know how frustrating it can be when your users can’t even get through the firewall to their own network. SolarWinds NAT Lookup helps simplify the network address translation lookup process to help get your users beyond their firewall translation issues, prevent overlapped policies that cause incorrect translations, and effectively troubleshoot address issues. 

 

So, what exactly can you do with this new addition to the SolarWinds free tool catalog?

 

  1. Search an IP address across one or multiple Palo Alto firewalls.

        Quickly perform an IP lookup on one or multiple Palo Alto firewalls to verify IP translation information.

 

     2. Troubleshoot and verify NAT policy configuration

         Render complete lists of all NAT policies that apply to a searched address to spot overlapping policies, cross-reference the policy configuration to live session traffic, and see the                order of overlapping NAT policies.

 

     3. See live session traffic per firewall for the translated address

         Gain insight into live session traffic per firewall for each translated address to help ensure the policy configuration matches observed behavior and performance.

 

     4. Export information for records keeping

         Keep historical records of each executed search by exporting the policy configurations from the tool into CSV file format.

 

Additional Features:

  • Removes the need for users to have direct access to firewalls
  • Easy distribution to other IT groups instead of granting direct access to your sensitive firewalls Helps users identify dedicated translated addresses

How do you plan to use SolarWinds NAT Lookup? We’d love to hear from you, so feel free to post your use cases in the comments below.

 

If you’d like to get the nitty-gritty, in-depth info about NAT Lookup and how you can make the most of it, check out this article

Synthetic user monitoring is a technique that simulates user transactions—or common paths on websites—so the administrator can watch for performance issues. These transactions are meant to represent how a user might be experiencing the site. For instance, is a potential customer getting an error when they add an item to their cart? Is a specific page loading slowly or not loading at all? These are things that can affect your bottom line and result in unplanned fire drills.

 

Synthetic user monitoring should not be confused with Real User Monitoring. Real User Monitoring captures and analyzes transactions from real users on a site. It helps understand load times for your pages from browsers in their actual locations.

 

These approaches provide different perspectives on web performance. Each have their benefits, but today—in honor of the release of Web Performance Monitor 3.0—we’re going to focus on situations when synthetic user monitoring is a good choice.

 

Find Performance Issues Before They Cause Problems for Your Users

IT infrastructure monitoring tools are great at telling you if a server or a service is up or down, but users might still be frustrated even if these things look OK. Synthetic user experience monitoring tools let you see if an overall transaction is working (can a user purchase something from your site?) or if a certain step is having trouble (when I click “buy” my payment processing is hanging). Once you’re alerted, you can go into troubleshooting mode with the specifics of what your users are seeing to minimize the impact. Plus, you can continuously run these tests from multiple locations to ensure things are working where your users are. 

 

Benchmark Your Site’s Performance to Identify Areas for Improvement

As mentioned, synthetic user experience monitoring tools can watch your websites from multiple locations at frequencies of your choice. Seeing this data over time can help you identify areas to optimize going forward. Waterfall charts can be particularly helpful to pinpoint performance bottlenecks over time.

 

Monitor the Performance of Critical SaaS Applications From Inside Your Firewall

Most companies rely on third-party SaaS applications to run some aspects of their business. For instance, your sales team may be using a SaaS CRM solution to drive and track their daily activities. It’s critical to know if your coworkers are having issues getting what they need. While you don’t own the app, you’re the one they’ll come to when they have issues. A common scenario is setting up a transaction to make sure a valid user can log in successfully and be alerted if it fails.

 

Knowing about failures or performance issues before your users can save you time and frustration. Synthetic user experience monitoring can help when it comes to websites and web-based applications. How have you used it? Comment below and let us know.

It’s easy to recognize problems in Ruby on Rails, but finding each problem’s source can be a challenging task. A problem due to an unexpected event could result in hours of searching through log files and attempting to reproduce the issue. Poor logs will leave you searching, while a helpful log can assist you in finding the cause right away.

 

Ruby on Rails applications automatically create and maintain the basic text logs for each environment, such as development, staging, and production. You can easily format and add extra information to the logs using open-source logging libraries, such as Lograge and Fluentd. These libraries effectively manage small applications, but as you scale your application across many servers, developers need to aggregate logs to troubleshoot problems across all of them.

 

In this tutorial, we will show you how Ruby on Rails applications handle logging natively. Then, we’ll show you how to send the logs to SolarWinds® Papertrail™. This log management solution enables you to centralize your logs in the cloud and provides helpful features like fast search, alerts, and more.

 

Ruby on Rails Native Logging

Ruby offers a built-in logging system. To use it, simply include the following code snippet in your environment.rb (development.rb/production.rb). You can find environments under the config directory of the root project.

config.logger = Logger.new(STDOUT)

Or you can include the following in an initializer:

Rails.logger = Logger.new(STDOUT)

By default, each log is created under #{Rails.root}/log/ and the log file is
named after the environment in which the application is running. The default format gives basic information that includes the date/time of log generation and description (message or exception) of the log.

D, [2018-08-31T14:12:44.116332 #28944] DEBUG -- : Debug message I, [2018-08-31T14:12:44.117330 #28944]  INFO -- : Test message F, [2018-08-31T14:12:44.118348 #28944] FATAL -- : Terminating application, raised unrecoverable error!!! F, [2018-08-31T14:12:44.122350 #28944] FATAL -- : Exception (something bad happened!):

Each log line also includes the severity, otherwise known as log level. The log levels enable you to filter the logs when outputting them or when monitoring problems, such as errors or fatal. The available log levels are :debug, :info, :warn, :error, :fatal, and :unknown. These are
converted to uppercase when output in the log file.

 

Formatting Logs Using Lograge

The default logging in Ruby on Rails during development or in production can be noisy, as you can see below. It also records a limited amount of
information for each page view.

I, [2018-08-31T14:37:44.588288 #27948]  INFO -- : method=GET path=/ format=html controller=Rails::WelcomeController action=index status=200 duration=105.06 view=51.52 db=0.00 params={'controller'=>'rails/welcome', 'action'=>'index'} headers=#<ActionDispatch::Http::Headers:0x046ab950> view_runtime=51.52 db_runtime=0

Lograge adds extra detail and uses a format that is less human readable, but more useful for large-scale analysis through its JSON output option. JSON makes it easier to search, filter, and summarize large volumes of logs. The discrete fields facilitate the process of searching through logs and filtering for the information you need.

I, [2018-08-31T14:51:54.603784 #17752]  INFO -- : {'method':'GET','path':'/','format':'html','controller':'Rails::WelcomeController','action':'index','status':200,'duration':104.06,'view':51.99,'db':0.0,'params':{'controller':'rails/welcome','action':'index'},'headers':'#<ActionDispatch::Http::Headers:0x03b75520>','view_runtime':51.98726899106987,'db_runtime':0}

In order to configure Lograge in a Ruby on Rails app, you need to follow some simple steps:

Step 1: Find the Gemfile under the project root directory and add the following gem.

gem 'lograge'

Step 2: Enable Lograge in each relevant environment (development, production, staging) or in the initializer. You can find all those environments under the config directory of your project. To find the initializer, open up the config directory of your project.

# config/initializers/lograge.rb# OR# config/environments/production.rbRails.application.configure do  config.lograge.enabled = trueend

Step 3: If you’re using Rails 5’s API-only mode and inherit from ActionController::API, you must define it as the controller base class that Lograge will patch:

# config/initializers/lograge.rbRails.application.configure do  config.lograge.base_controller_class = 'ActionController::API'end

With Lograge, you can include additional attributes in log messages, like user ID or request ID, host, source IP, etc. You can read the Lograge documentation to get more information.


Here’s a simple example that captures three attributes:

class ApplicationController < ActionController::Base  before_action :append_info_to_payload  def append_info_to_payload(payload)    super    payload[:user_id] = current_user.try(:id)    payload[:host] = request.host    payload[:source_ip] = request.remote_ip  endend

The above three attributes are logged in environment.rb (production.rb/development.rb) with this block.

config.lograge.custom_options = lambda do |event|  event.payloadend

Troubleshoot Problems Faster Using Papertrail

Papertrail is a popular cloud-hosted log management service that integrates with different logging library solutions. It is easier to centralize all your Ruby on Rails log management in the cloud. You can quickly track real-time activity, making it easier to identify and troubleshoot real-time production applications.

 

Papertrail provides numerous features for handling Ruby on Rails log files, including:

 

Instant log visibility: Papertrail provides fast search and team-wide
access. It also provides analytics reporting and webhook monitoring, which
can be set up typically in less than a minute.

 

Aggregate logs: : Papertrail aggregates logs across your entire deployment, making them available from a single location. It provides you with an easy way to access logs, including application logs, database logs, Apache logs, and more.

2018-10-03-viewer

 

Tail and search logs: Papertrail lets you tail logs in real time from
multiple devices. With the help of advanced searching and filtering tools, you can quickly troubleshoot issues in a production environment.

 

Proactive alert notifications: Almost every application has critical events
that require human attention. That’s precisely why alerts exist. Papertrail gives you the ability to receive alerts via email, Slack, Librato®, PagerDuty, or any custom HTTP webhooks of your choice.

2018-10-03-edit-alert

 

Log archives: You can load the Papertrail log archives into third-party utilities, such as Redshift or Hadoop.

 

Logs scalability: With Papertrail, you can scale your log volume and desired searchable duration.

 

Encryption: For your security, Papertrail supports optional TLS encryption
and certificate-based destination host verification.

Configuring Ruby on Rails to Send Logs to Papertrail

It’s an easy task to get started with Papertrail. If you already have log files,
you can send them to Papertrail using Nxlog or remote_syslog2. This utility will monitor the log files and send new logs to Papertrail. Next, we’ll show you how to send events asynchronously from Ruby on Rails using the remote_syslog_logger.

Add the remote_syslog_logger to your Gemfile. If you are not using a Gemfile, run the following script:

$ gem install remote_syslog_logger

Change the environment configuration file to log via remote_syslog_logger. This is almost always in config/environment.rb (to affect all environments) or config/environments/<environment name>.rb, such as config/environments/production.rb (to affect only a specific environment). Update the host and port to the ones given to you in your Papertrail log destination settings.

config.logger = RemoteSyslogLogger.new('logsN.papertrailapp.com', XXXXX)

It’s that simple! Your logs should now be sent to Papertrail.

 

Papertrail is designed to help you troubleshoot customer problems, resolve error messages, improve slow database queries, and more. It gives you analytical tools to help identify and resolve system anomalies and potential security issues. Learn more about how Papertrail can give you frustration-free log management in the cloud, and sign up for a trial or the free plan to get started.

Slow websites on your mobile device are frustrating when you’re trying to look up something quickly. When a page takes forever to load, it’s often due to a spotty network connection or a website that is overly complicated for a phone. Websites that load many images or videos can also eat up your data plan. Most people have a monthly cap on the amount of data they can use, and it can be expensive to pay an overage fee or upgrade your plan.

 

Can switching to a different browser app truly help websites load faster and use less data? We’ll put the most popular mobile browsers to the test to see which is the fastest and uses the least data. Most people use their phone’s default browser app, like Safari on iPhone or Chrome on Android. Other browsers, like Firefox Focus and Puffin, claim to better at saving data. Let’s see which one comes out on top with our testing.

 

How We Benchmark

We’ll look specifically at page-load performance by testing three popular websites with different styles of content. The first will be the Google home page, which should load quickly as Google designed it to be fast. Next, we’ll measure the popular social media website Reddit. Lastly, we’ll test BuzzFeed, a complex website with many ads and trackers.

 

To conduct these tests, we’ll use an Apple iPhone 7. (We may look at other phones such as Android in future articles.) We’ll use the browsers with default settings and clear any private browsing data so cached data won’t change the results.

 

Since we don’t have access to the browser developer tools we’d typically have on a desktop, we’ll need to use a different technique. One way is to time how long it takes to download the page, but some websites preload data in the background to make your next click load faster. From the user’s perspective, this shouldn’t count toward the page-load time because it happens behind the scenes. A better way is to record a video of each page loading. We can then play them back and see how long each took to load all the visible content.

 

To see how much data each browser used, we’ll use something called a “proxy server” to monitor the phone’s connections. Normally, phones load data directly through the cellular carrier’s LTE connection or through a router’s Wi-Fi connection. A proxy server acts like a man in the middle, letting us count how much data passes between the website and the phone. It also lets us see which websites it loaded data from and even the contents of the data.

 

We’ll use the proxy server software called Fiddler. This tool also allows enables us to decrypt the HTTPS connection to the website and spy on exactly which data is being sent. We configured it for iOS by installing a root certificate on our phone, which the computer can use to decrypt the data. Fiddler terminates the SSL connection with the external website, then encrypts the data to our phone using its own root certificate. It allows us to see statistics on which sites were visited, which assets where loaded, and more.


©2015 Telerik

 

The Puffin browser made things more challenging because we were unable to see the contents of pages after installing the Fiddler root certificate. It’s possible Puffin uses a technique called certificate pinning. Nevertheless, we were still able to see the number of bytes being sent over the connection to our phone and which servers it connected to.

 

Which Browser has the Best Mobile Performance?

Here are the results of measuring the page-load time for each of the mobile browsers against our three chosen websites. Faster page load times are better.

BrowserGoogle.comReddit.comBuzzfeed.com
Safari3.48s5.50s8.67s
Chrome1.03s4.93s5.93s
Firefox1.89s3.47s3.50s
Firefox focus2.67s4.90s5.70s
Puffin0.93s2.20s2.40s

 

The clear winner in the performance category is Puffin, which loaded pages about twice as fast as most other browsers. Surprisingly, it even loaded Google faster than Chrome. Puffin claims the speed is due to a proprietary compression technology. Most modern browsers support gzip compression, but it’s up to site operators to enable it. Puffin can compress all content by passing it through its own servers first. It can also downsize images and videos so they load faster on mobile.

 

Another reason Puffin was so much faster is because it connected to fewer hosts. Puffin made requests to only 14 hosts, whereas Safari made requests to about 50 hosts. Most of those extra hosts are third-party advertisement and tracking services. Puffin was able to identify them and either remove them from the page or route calls through its own, faster servers at cloudmosa.net.

PuffinSafari
vid.buzzfeed.com: 83img.buzzfeed.com: 51
google.com: 9www.google-analytics.com: 16
www.google.com: 2www.buzzfeed.com: 14
en.wikipedia.org: 2tpc.googlesyndication.com: 9
pointer2.cloudmosa.net: 2securepubads.g.doubleclick.net: 7
data.flurry.com: 2pixiedust.buzzfeed.com: 7
www.buzzfeed.com: 2vid.buzzfeed.com: 6
pivot-ha2.cloudmosa.net: 1cdn-gl.imrworldwide.com: 6
p40-buy.itunes.apple.com: 1www.facebook.com: 6
gd11.cloudmosa.net: 1sb.scorecardresearch.com: 3
gd10.cloudmosa.net: 1cdn.smoot.apple.com: 3
gd9.cloudmosa.net: 1pagead2.googlesyndication.com: 3
collector.cloudmosa.net: 1video-player.buzzfeed.com: 3
www.flashbrowser.com: 1gce-sc.bidswitch.net: 3
secure-dcr.imrworldwide.com: 3
connect.facebook.net: 3
events.redditmedia.com: 3
s3.amazonaws.com: 2
thumbs.gfycat.com: 2
staticxx.facebook.com: 2
id.rlcdn.com: 2
i.redditmedia.com: 2
googleads.g.doubleclick.net: 2
videoapp-assets-ak.buzzfeed.com: 2
c.amazon-adsystem.com: 2
buzzfeed-d.openx.net: 2
pixel.quantserve.com: 2
… 20 more omitted

 

It’s great Puffin was able to load data so quickly, but it raises some privacy questions. Any users of this browser are giving CloudMosa access to their entire browsing history. While Firefox and Chrome let you opt out of sending usage data, Puffin does not. In fact, it’s not possible to turn this tracking off without sacrificing the speed improvements. The browser is supported by ads, although its privacy policy claims it doesn’t keep personal data. Each user will have to decide if he or she is comfortable with this arrangement.

 

Which Browser Uses the Least Mobile Data?

Now let’s look at the amount of data each browser uses. Again, we see surprising results:

BrowserGoogle.comReddit.comBuzzfeed.com
Safari0.82MB2.89MB4.22MB
Chrome0.81MB2.91MB5.46MB
Firefox0.82MB2.62MB3.15MB
Firefox Focus0.79MB2.61MB3.13MB
Puffin0.54MB0.17MB42.2MB

 

Puffin was the clear leader for loading google.com and it dominated reddit.com by a factor of 10. It claims it saved 97% of data usage on reddit.com.


©2015 Telerik

 

However, Puffin lost on buzzfeed.com by a factor of 10. In Fiddler, we saw that it made 83 requests to vid.buzzfeed.com. It appears it was caching video data in the background so videos would play faster. While doing so saves the user time, it ends up using way more data. On a cellular plan, this approach could quickly eat up a monthly cap.

 

As a result, Firefox Focus came in the lead for data usage on buzzfeed.com. Since Firefox Focus is configured to block trackers by default, it was able to load the page using the least amount of mobile data. It was also able to avoid making requests to most of the trackers listed in the Buzzfeed section above. In fact, if we take away Puffin, Firefox Focus came in the lead consistently for all the pages. If privacy is important, Firefox Focus could be a great choice for you.

 

How to Test Your Website Performance

Looking at the three websites we tested, we see an enormous difference in the page-load time in the amount of data used. It matters because higher page-load time is correlated to higher bounce rates and even lower online purchases.

 

Pingdom® makes it even easier to test your own website’s performance with page speed monitoring. It gives you a report card showing how your website compares with others in terms of load time and page size.

To get a better idea of your customer’s experience, you can see a film strip showing how content on the page loads over time. Below, we can see that Reddit takes about two seconds until it’s readable. If we scroll over, we’d see it takes about six seconds to load all the images.

The SolarWinds® Pingdom® solution also allows us to dive deeper into a timeline view showing exactly which assets were loaded and when. The timeline view helps us see if page assets are loading slowly because of network issues or their size, or because third parties are responding slowly. The view will give us enough detail to go back to the engineering team with quantifiable data.

Pingdom offers a free version that gives you a full speed report and tons of actionable insights. The paid version also gives you the filmstrip, tracks changes over time, and offers many more website monitoring tools.

 

Conclusion

The mobile browser you choose can make a big difference in terms of page-load time and data usage. We saw that the Puffin browser was able to load pages much faster than the default Safari browser on an Apple iPhone 7. Puffin also used less data to load some, but not all, pages. However, for those who care about privacy and saving data on their mobile plan, Firefox Focus may be your best bet.

 

Because mobile performance is so important for customers, you can help improve your own website using the Pingdom page speed monitoring solution. This tool will give you a report card to share with your team and specific actions you can take to make your site faster.

We’re no strangers to logging from Docker containers here at SolarWinds® Loggly®. In the past, we’ve demonstrated different techniques for logging individual Docker containers. But while logging a handful of containers is easy, what happens when you start deploying dozens, hundreds, or thousands of containers across different machines?

In this post, we’ll explore the best practices for logging applications deployed using Docker Swarm.

Intro to Docker Swarm

Docker Swarm is a container orchestration and clustering tool from the creators of Docker. It allows you to deploy container-based applications across a number of computers running Docker. Swarm uses the same command-line interface (CLI) as Docker, making it more accessible to users already familiar with Docker. And as the second most popular orchestration tool behind Kubernetes, Swarm has a rich ecosystem of third-party tools and integrations.

A swarm consists of manager nodes and worker nodes. Managers control how containers are deployed, and workers run the containers. In Swarm, you don’t interact directly with containers, but instead define services that define what the final deployment will look like. Swarm handles deploying, connecting, and maintaining these containers until they meet the service definition.

For example, imagine you want to deploy an Nginx web server. Normally, you would start an Nginx container on port 80 like so:

$ docker run --name nginx --detach --publish 80:80 nginx

With Swarm, you instead create a service that defines what image to use, how many replica containers to create, and how those containers should interact with both the host and each other. For example, let’s deploy an Nginx image with three containers (for load balancing) and expose it over port 80.

$ docker service create --name nginx --detach --publish 80:80 --replicas 3 nginx

When the deployment is done, you can access Nginx using the IP address of any node in the Swarm.

Best Practices for Logging in Docker Swarm 1
© 2011-2018 Nginx, Inc.

To learn more about Docker services, see the services documentation.

 

The Challenges of Monitoring and Debugging Docker Swarm

Besides the existing challenges in container logging, Swarm adds another layer of complexity: an orchestration layer. Orchestration simplifies deployments by taking care of implementation details such as where and how containers are created. But if you need to troubleshoot an issue with your application, how do you know where to look? Without comprehensive logs, pinpointing the exact container or service where an error occurred can become an operational nightmare.

On the container side, nothing much changes from a standard Docker environment. Your containers still send logs to stdout and stderr, which the host Docker daemon accesses using its logging driver. But now your container logs include additional information, such as the service that the container belongs to, a unique container ID, and other attributes auto-generated by Swarm.

Consider the Nginx example. Imagine one of the containers stops due to a configuration issue. Without a monitoring or logging solution in place, the only way to know this happened is by connecting to a manager node using the Docker CLI and querying the status of the service. And while Swarm automatically groups log messages by service using the docker service logs command, searching for a specific container’s messages can be time-consuming because it only works when logged in to that specific host.

 

How Docker Swarm Handles Logs

Like a normal Docker deployment, Swarm has two primary log destinations: the daemon log (events generated by the Docker service), and container logs (events generated by containers). Swarm doesn’t maintain separate logs, but appends its own data to existing logs (such as service names and replica numbers).

The difference is in how you access logs. Instead of showing logs on a per-container basis using docker logs <container name>, Swarm shows logs on a per-service basis using docker service logs <service name>. This aggregates and presents log data from all of the containers running in a single service. Swarm differentiates containers by adding an auto-generated container ID and instance ID to each entry.

For example, the following message was generated by the second container of the nginx_nginx service, running on swarm-client1.

# docker service logs nginx_nginx  nginx_nginx.2.subwnbm15l3f@swarm-client1 | 10.255.0.2 - - [01/Jun/2018:22:21:11 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0" "-"

To learn more about the logs command, see the Docker documentation.

 

Options for Logging in Swarm

Since Swarm uses Docker’s existing logging infrastructure, most of the standard Docker logging techniques still apply. However, to centralize your logs, each node in the swarm will need to be configured to forward both daemon and container logs to the destination. You can use a variety of methods such as Logspout, the daemon logging driver, or a dedicated logger attached to each container.

 

Best Practices to Improve Logging

To log your swarm services effectively, there are a few steps you should take.

 

1. Log to STDOUT and STDERR in Your Apps

Docker automatically forwards all standard output from containers to the built-in logging driver. To take advantage of this, applications running in your Docker containers should write all log events to STDOUT and STDERR. If you try to log from within your application, you risk losing crucial data about your deployment.

 

2. Log to Syslog Or JSON

Syslog and JSON are two of the most commonly supported logging formats, and Docker is no exception. Docker stores container logs as JSON files by default, but it includes a built-in driver for logging to Syslog endpoints. Both JSON and Syslog messages are easy to parse, contain critical information about each container, and are supported by most logging services. Many container-based loggers such as Logspout support both JSON and Syslog, and Loggly has complete support for parsing and indexing both formats.

 

3. Log to a Centralized Location

A major challenge in cluster logging is tracking down log files. Services could be running on any one of several different nodes, and having to manually access log files on each node can become unsustainable over time. Centralizing logs lets you access and manage your logs from a single location, reducing the amount of time and effort needed to troubleshoot problems.

One common solution for container logs is dedicated logging containers. As the name implies, dedicated logging containers are created specifically to gather and forward log messages to a destination such as a syslog server. Dedicated containers automatically collect messages from other containers running on the node, making setup as simple as running the container.

 

Why Loggly Works for Docker Swarm

Normally you would access your logs by connecting to a master node, running docker service logs <service name>, and scrolling down to find the logs you’re looking for. Not only is this labor-intensive, but it’s slow because you can’t easily search, and it’s difficult to automate with alerts or create graphs. The more time you spend searching for logs, the longer problems go unresolved. This also means creating and maintaining your own log centralization infrastructure, which can become a significant project on its own.

Loggly is a log aggregation, centralization, and parsing service. It provides a central location for you to send and store logs from the nodes and containers in your swarm. Loggly automatically parses and indexes messages so you can search, filter, and chart logs in real-time. Regardless of how big your swarm is, your logs will be handled by Loggly.

 

Sending Swarm Logs to Loggly

The easiest way to send your container logs to Loggly is with Logspout. Logspout is a container that automatically routes all log output from other containers running on the same node. When deploying the container in global mode, Swarm automatically creates a Logspout container on each node in the swarm.

 

To route your logs to Loggly, provide your Loggly Customer Token and a custom tag, then specify a Loggly endpoint as the logging destination.

# docker service create --name logspout --mode global --detach --volume=/var/run/docker.sock:/var/run/docker.sock --volume=/etc/hostname:/etc/host_hostname:ro -e SYSLOG_STRUCTURED_DATA="<Loggly Customer Token>@41058 tag=\"<custom tag>\"" gliderlabs/logspout syslog+tcp://logs-01.loggly.com:514

You can also define a Logspout service using Compose.

#

 docker-compose-logspout.yml  version: "3"  networks:   logging:  services:   logspout:     image: gliderlabs/logspout     networks:       - logging     volumes:       - /etc/hostname:/etc/host_hostname:ro       - /var/run/docker.sock:/var/run/docker.sock     environment:       SYSLOG_STRUCTURED_DATA: "<Loggly Customer Token>@41058"       tag: "<custom tag>"     command: syslog+tcp://logs-01.loggly.com:514     deploy:       mode: global

Use docker stack deploy to deploy the Compose file to your swarm. <stack name> is the name that you want to give to the deployment.

# docker stack deploy --compose-file docker-compose-logspout.yml <stack name>

As soon as the deployment is complete, messages generated by your containers start appearing in Loggly.
Best Practices for Logging in Docker Swarm 2

Configuring Dashboards and Alerts

Since Swarm automatically appends information about the host, service, and replica to each log message, we can create Dashboards and Alerts similar to those for a single-node Docker deployment. For example, Loggly automatically breaks down logs from the Nginx service into individual fields.

Best Practices for Logging in Docker Swarm 3

We can create Dashboards that show, for example, the number of errors generated on each node, as well as the container activity level on each node.

Best Practices for Logging in Docker Swarm 4

Alerts are useful for detecting changes in the status of a service. If you want to detect a sudden increase in errors, you can easily create a search that scans messages from a specific service for error-level logs.

Best Practices for Logging in Docker Swarm 5

You can select this search from the Alerts screen and specify a threshold. For example, this alert triggers if the Nginx service logs more than 10 errors over a 5-minute period.

Best Practices for Logging in Docker Swarm 6

Conclusion

While Swarm can add a layer of complexity over a typical Docker installation, logging it doesn’t have to be difficult. Tools like Logspout and Docker logging drivers have made it easier to collect and manage container logs no matter where those containers are running. And with Loggly, you can easily deploy a complete, cluster-wide logging solution across your entire environment.

Have you ever wondered what happens when you type an address into your browser? The first step is the translation of a domain name (such as pingdom.com) to an IP address. Resolving domain names is done through a series of systems and protocols that make up the Domain Name System (DNS). Here we’ll break down what DNS is, and how it powers the underlying infrastructure of the internet.

 

What is DNS?

Traffic across the internet is routed by an identifier called an IP address. You may have seen IP addresses before. IPv4 addresses are a series of four numbers under 256, separated by periods (for example: 123.45.67.89).

 

IP addresses are at the core of communicating between devices on the internet, but they are hard to memorize and can change often, even for the same service. To get around these problems, we give names to IP addresses. For example, when you type https://www.pingdom.com into your web browser, it translates that name into an IP address, which your computer then uses to access a server that ultimately responds with the contents of the page that your browser displays. If a new server is put into place with a new IP address, that name can simply be updated to point to the new address.

 

These records are stored in the name server for a given name, or “zone,” in DNS parlance. These zones can include many different records and record types for the base name and subdomains in that zone.

 

The internet is decentralized, designed to withstand failure, and not rely on a single source of truth. DNS is built for this environment using recursion, which enables DNS servers to talk to each other to find the answer for a request. Each server is more authoritative than the last, until it reaches one of 13 “root” servers that are globally maintained as the definitive source for other DNS servers.

 

Anatomy of a DNS Request

When you type in “pingdom.com” to your browser and hit enter, your browser doesn’t directly ask the web servers for that page. First, a multi-step interaction with DNS servers must happen to translate pingdom.com into an IP address that is useable for establishing a connection and routing traffic. Here’s what that interaction looks like:

  1. Recursive DNS server requests abc.com from a DNS root server. The root server replies with the .com TLD name server IP address.
  2. Recursive DNS server requests abc.com from the .com TLD name server. The TLD name server replies with the authoritative name server for abc.com.
  3. Recursive DNS server requests abc.com from the abc.com nameserver. The nameserver replies with the IP address A record for abc.com. This IP address is returned to the client.
  4. Client requests abc.com using the web server’s IP address that was just resolved.

 

In subsequent requests, the recursive name server will have the IP address for pingdom.com.

This IP address is cached for a period of time determined by the pingdom.com nameserver. This value is called the time-to-live (TTL) for that domain record. A high TTL for a domain record means that local DNS resolvers will cache responses for longer and give quicker responses. However, making changes to DNS records can take longer due to the need to wait for all cached records to expire. Conversely, domain records with low TTLs can change much more quickly, but DNS resolvers will need to refresh their records more often.

 

Not Just for the Web

The DNS protocol is for anything that requires a decentralized name, not just the web. To differentiate between various types of servers registered with a nameserver, we use record types. For example, email servers are part of DNS. If a domain name has an MX record, it is signaling that the address associated with that record is an email server.

 

Some of the more common record types you will see are:

  • A Record – used to point names directly at IPv4 addresses. This is used by web browsers.
  • AAAA Record – used to point names directly at IPV6 addresses. This is used by web browsers when a device has an IPv6 network.
  • CNAME Record – also known as the Canonical Name record and is used to point web domains at other DNS names. This is common when using platforms as a service such as Heroku or cloud load balancers that provide an external domain name rather than an IP address.
  • MX Record – as mentioned before, MX records are used to point a domain to mail servers.
  • TXT Record – arbitrary information attached to a domain name. This can be used to attach validation or other information about a domain name as part of the DNS system. Each domain or subdomain can have one record per type, with the exception of TXT records.

 

DNS Security and Privacy

There are many parts to resolving a DNS request, and these parts are subject to security and privacy issues. First, how do we verify that the IP address we requested is actually the one on file with the domain’s root nameserver? Attacks exist that can disrupt the DNS chain, providing false information back to the client or triggering denial of service attacks upon sites. Untrusted network environments are vulnerable to man-in-the-middle attacks that can hijack DNS requests and provide back false results.

 

There is ongoing work to enhance the security of DNS with the Domain Name System Security Extensions (DNSSEC). This is a combination of new records, public-key cryptography, and establishing a chain of trust with DNS providers to ensure domain records have not been tampered with. Some DNS providers today offer the ability to enable DNSSEC, and its adoption is growing as DNS-based attacks become more prevalent.

 

DNS requests are also typically unencrypted, which allows attackers and observers to pry into the contents of a DNS request. This information is valuable, and your ISP or recursive zone provider may be providing this information to third parties or using it to track your activity. Furthermore, it may or may not contain personally identifiable information like your IP address, which can be correlated with other tracking information that third parties may be holding.

 

There are a few ways to help protect your privacy with DNS and prevent this sort of tracking:

 

1. Use a Trusted Recursive Resolver

Using a trusted recursive resolver is the first step to ensuring the privacy of your DNS requests. For example, the Cloudflare DNS service https://1.1.1.1is a fast, privacy-centric DNS resolver. Cloudflare doesn’t log IP addresses or track requests that you make against it at any time.

 

2. Use DNS over HTTPS (DoH)

DoH is another way of enhancing your privacy and security when interacting with DNS resolvers. Even when using a trusted recursive resolver, man-in-the-middle attacks can alter the returned contents back to the requesting client. DNSSEC offers a way to fix this, but adoption is still early, and relies on DNS providers to enable this feature.

 

DoH secures this at the client to DNS resolver level, enabling secure communication between the client and the resolver. The Cloudflare DNS service offers DNS over HTTPS, further enhancing the security model that their recursive resolver provides. Keep in mind that the domain you’re browsing is still available to ISPs thanks to Server Name Indication, but the actual contents, path, and other parts of the request are encrypted.

 

Even without DNSSEC, you can still have a more private internet experience. Firefox recently switched over to using the Cloudflare DNS resolver for all requests by default. At this time, DoH isn’t enabled by default unless you are using the nightly build.

 

Monitoring DNS Problems

DNS is an important part of your site’s availability because a problem can cause a complete outage. DNS has been known to cause outages due to BGP attacks, TLD outages, and other unexpected issues. It’s important your uptime or health check script includes DNS lookups.

Using SolarWinds® Pingdom®, we can monitor for DNS problems using the uptime monitoring tool. Here we will change the DNS record for a domain and show you how the Pingdom tool responds. Once you have an uptime check added in Pingdom, click the “Reports” section, and “Uptime” under that section, then go to your domain of interest. Under the “Test Result Log” tab for an individual domain’s uptime report, hover over the failing entry to see why a check failed.

This tells us that for our domain, we have a “Non-recoverable failure in name resolution.” This lets us know to check our DNS records. After we fix the problem, our next check succeeds:

Pingdom gives us a second set of eyes to make sure our site is still up as expected.

 

Curious to learn more about DNS? Check out our post on how to test your DNS-configuration. You can also learn more about Pingdom uptime monitoring.

ikongf

Papertrail for Python Logs

Posted by ikongf Employee Apr 27, 2019

When you’re troubleshooting a problem or tracking down a bug in Python, the first place to look for clues related to server issues is in the application log files.

 

Python includes a robust logging module in the standard library, which provides a flexible framework for emitting log messages. This module is widely used by various Python libraries and is an important reference point for most programmers when it comes to logging.

 

The Python logging module provides a way for applications to configure different log handlers and provides a standard way to route log messages to these handlers. As the Python.org documentation notes, there are four basic classes defined by the Python logging module: Loggers, Handlers, Filters, and Formatters. We’ll provide more details on these below.

 

Getting Started with Python Logs

There are a number of important steps to take when setting up your logs. First, you need to ensure logging is enabled in the applications you use. You also need to categorize your logs by name so they are easy to maintain and search. Naming the logs makes it easier to search through large log files, and to use filters to find the information you need.

 

To send log messages in Python, request a logger object. It should have a unique name to help filter and prioritize how your Python application handles various messages. We are also adding a StreamHandler to print the log on our console output. Here’s a simple example:

import logginglogging.basicConfig(handlers=[logging.StreamHandler()])log = logging.getLogger('test')log.error('Hello, world')

This outputs:

ERROR:test:Hello, world

This message consists of three fields. The first, ERROR, is the log level. The second, test, is the logger name. The third field, “Hello, world”, is the free-form log message.

 

Most problems in production are caused by unexpected or unhandled issues. In Python, such problems generate tracebacks where the interpreter tries to include all important information it could gather. This can sometimes make the traceback a bit hard to read, though. Let’s look at an example traceback. We’ll call a function that isn’t defined and examine the error message.

def test():    nofunction()test()

Which outputs:

Traceback (most recent call last):   File '<stdin>', line 1, in <module>   File '<stdin>', line 2, in test NameError: global name 'nofunction' is not defined

This shows the common parts of a Python traceback. The error message is usually at the end of the traceback. It says “nofunction is not defined,” which is what we expected. The traceback also includes the lines of all stack frames that were touched when this error occurred. Here we can see that it occurred in the test function on line two. Stdin means standard input and refers to the console where we typed this function. If we were using a Python source file, we’d see the file name here instead.

 

Configuring Logging

You should configure the logging module to direct messages to go where you want them. For most applications, you will want to add a Formatter and a Handler to the root logger. Formatters let you specify the fields and timestamps in your logs. Handlers let you define where they are sent. To set these up, Python provides a nifty factory function called basicConfig.

import logginglogging.basicConfig(format='%(asctime)s %(message)s',                  handlers=[logging.StreamHandler()])logging.debug('Hello World!')

By default, Python will output uncaught exceptions to your system’s standard error stream. Alternatively, you could add a handler to the excepthook to send any exceptions through the logging module. This gives you the flexibility to provide custom formatters and handlers. For example, here we log our exceptions to a log file using the FileHandler:

import loggingimport syslogger = logging.getLogger('test')fileHandler = logging.FileHandler('errors.log')logger.addHandler(fileHandler)def my_handler(type, value, tb):  logger.exception('Uncaught exception: {0}'.format(str(value)))# Install exception handlersys.excepthook = my_handler# Throw an errornofunction()

Which results in the following log output:

$ cat errors.log Uncaught exception: name 'nofunction' is not defined None

In addition, you can filter logs by configuring the log level. One way to set the log level is through an environment variable, which gives you the ability to customize the log level in the development or production environment. Here’s how you can use the LOGLEVEL environment variable:

$ export LOGLEVEL='ERROR' $ python >>> import logging >>> logging.basicConfig(handlers=[logging.StreamHandler()]) >>> logging.debug('Hello World!') #prints nothing >>> logging.error('Hello World!') ERROR:root:Hello World!

Logging from Modules

Modules intended for use by other programs should only emit log messages. These modules should not configure how log messages are handled. A standard logging best practice is to let the Python application importing and using the modules handle the configuration.

Another standard best practice to follow is that each module should use a logger named like the module itself. This naming convention makes it easy for the application to distinctly route various modules and helps keep the log code in the module simple.

You need just two lines of code to set up logging using the named logger. Once you do this in Python, the “ name” contains the full name of the current module, and will work in any module. Here’s an example:

import logginglog = logging.getLogger(__name__)def do_something():    log.debug('Doing something!')

Analyzing Your Logs with Papertrail

Python applications on a production server contain millions of lines of log entries. Command line tools like tail and grep are often useful during the development process. However, they may not scale well when analyzing millions of log events spread across multiple servers.

 

Centralized logging can make it easier and faster for developers to manage a large volume of logs. By consolidating log files onto one integrated platform, you can eliminate the need to search for related data that is split across multiple apps, directories, and servers. Also, a log management tool can alert you to critical issues, helping you more quickly identify the root cause of unexpected errors, as well as bugs that may have been missed earlier in the development cycle.

 

For production-scale logging, a log management tool such as Solarwinds® Papertrail™ can help you better manage your data. Papertrail is a cloud-based platform designed to handle logs from any Python application, including Django servers.

 

The Papertrail solution provides a central repository for event logs. It helps you consolidate all of your Python logs using syslog, along with other application and database logs, giving you easy access all in one location. It offers a convenient search interface to find relevant logs. It can also stream logs to your browser in real time, offering a “live tail” experience. Check out the tour of the Papertrail solution’s features.

2018-09-07-viewer

Papertrail is designed to help minimize downtime. You can receive alerts via email, or send them to Slack, Librato, PagerDuty, or any custom HTTP webhooks of your choice. Alerts are also accessible from a web page that enables customized filtering. For example, you can filter by name or tag.

2018-09-07-edit-alert

 

Configuring Papertrail in Your Application

There are many ways to send logs to Papertrail depending on the needs of your application. You can send logs through journald, log files, Django, Heroku, and more. We will review the syslog handler below.

 

Python can send log messages directly to Papertrail with the Python SysLogHandler. Just set the endpoint to the log destination shown in your Papertrail settings. You can optionally format the timestamp or set the log level as shown below.

import loggingimport socketfrom logging.handlers import SysLogHandlersyslog = SysLogHandler(address=('logsN.papertrailapp.com', XXXXX))format = '%(asctime)s YOUR_APP: %(message)s'formatter = logging.Formatter(format, datefmt='%b %d %H:%M:%S')syslog.setFormatter(formatter)logger = logging.getLogger()logger.addHandler(syslog)logger.setLevel(logging.INFO)def my_handler(type, value, tb):  logger.exception('Uncaught exception: {0}'.format(str(value)))# Install exception handlersys.excepthook = my_handlerlogger.info('This is a message')nofunction() #log an uncaught exception

Conclusion

Python offers a well-thought-out framework for logging that makes it simple to enable and manage your log files. Getting started is easy, and a number of tools baked into Python automate the logging process and help ensure ease of use.

Papertrail adds even more functionality and tools for diagnostics and analysis, enabling you to manage your Python logs on a centralized cloud server. Quick to setup and easy to use, Papertrail consolidates your logs on a safe and accessible platform. It simplifies your ability to search log files, analyze them, and then act on them in real time—so that you can focus on debugging and optimizing your applications.

Learn more about how Papertrail can help optimize your development ecosystem.

For an infrastructure to be healthy, there must be good monitoring. The team should have a monitoring infrastructure that speeds up and facilitates the verification of problems, following the line of prevention, maintenance, and correction. SolarWinds® AppOptics™ was created with the purpose of helping monitoring teams control infrastructure, including Linux monitoring.

 

Monitoring overview

It is critical that a technology team prepare for any situation that occurs in their environment. The purpose of monitoring is to be aware of changes in the environment so that problems can be solved with immediate action. Good monitoring history and proper perception can allow you to suggest environmental improvements according to the charts. If you have a server that displays memory usage for a certain amount of time, you can purchase more memory, or investigate the cause of the abnormal behavior before the environment becomes unavailable.

 

Monitoring indexes can be used for various purposes, such as application availability for a given number of users, tool deployment tracking, operating system update behavior, purchase requests, and exchanges or hardware upgrades. Each point of use depends on your deployment purpose.

 

Linux servers historically have operating systems that are difficult to monitor because most of the tools in the market serve other platforms. In addition, a portion of IT professionals cannot make monitoring work properly on these servers, so when a disaster occurs, it is difficult to identify what happened.

 

Constant monitoring of servers and services used in production is critical for company environments. Server failures in virtualization, backup, firewalls, and proxies can directly affect availability and quality of service.

 

The Linux operating system offers a basic monitoring system for more experienced administrators, but when it comes to monitoring, real-time reports are needed for immediate action. You cannot count on an experienced system administrator being available to access the servers, or that they can perform all existing monitoring capabilities.

 

In the current job market, it is important to remember that Linux specialists are rare, and their availability is limited. There are cases where an expert administrator can only act on a server when the problem has been long-standing. Training for teams to become Linux experts can be

expensive and time-consuming, with potentially low returns.

 

Metrics used for monitoring

  1. CPU – It is crucial to monitor CPU, as it can reach a high utilization rate and temperature. It can have multiple cores, but an application can be directed to only one of these cores, pointing to a dangerous hardware behavior.

  2. Load – This specifies whether the CPU is being used, how much is being executed, and how long it has been running.

  3. Disk Capacity and IO – Disk capacity is especially important when it comes to image servers, files, and VMs, as it can directly affect system shutdown, corrupt the operating system, or cause extreme IO slowness. Along with disk monitoring, it’s possible to plan for an eventual change or addition of a disk, and to verify the behavior of a disk that demonstrates signs of hardware failure.

  4. Network – When it comes to DNS, DHCP, firewall, file server, and proxy, it is extremely important to monitor network performance as input and output of data packets. With network performance logs, you can measure the utilization of the card, and create a plan to suit the application according to the use of the network.

  5. Memory – Memory monitoring in other components determines the immediate stop of a system due to memory overflow or misdirection for a single application.

  6. Swap – This is virtual memory created by the system and allocated to disk to be used when necessary. Its high utilization can indicate that the amount of memory for the server is insufficient.

With this information from Linux systems, you can have good monitoring and a team that can act immediately on downtime that can paralyze critical systems.

 

 

Monitoring with AppOptics

AppOptics is a real-time web monitoring tool that enables you to set up a real-time monitoring environment, create alerts by e-mail, and focus on threshold and monitoring history. You can also create monitoring levels with profiles of equipment to be monitored, and have simple monitoring viewers that can trigger a specialist or open a call for immediate action when needed.

 

This tool can also be an ally of an ITIL/COBIT team, which can use the reports to justify scheduled and unscheduled stops, and clarify systems that historically have problems. It can also be used to justify the purchase of new equipment, software upgrades, or the migration of a system that no longer meets the needs of a company.

 

AppOptics can be installed in major Linux distributions such as Red Hat, CentOS, Ubuntu, Debian, Fedora, and Amazon Linux. Its deployment is easy, fast, and practical.

 

 

Installing the AppOptics Agent on the Server

Before you start, you’ll need an account with AppOptics. If you don’t already have one, you can create a demo account which will give you 14 days to try the service, free of charge. Sign up here.

 

First, to allow AppOptics to aggregate the metrics from the server, you will need to install the agent on all instances. To do this, you’ll need to reference your AppOptics API token when setting up the agent. Log in to your AppOptics account and navigate to the Infrastructure page.

 

Locate the Add Host button, and click on it. It should look similar to the image below.

Fig. 2. AppOptics Host Agent Installation

 

You can follow a step-by-step guide on the Integration page, where there are Easy Install and Advanced options for users. I used an Ubuntu image in the AWS Cloud, but this will work on almost any Linux server.

 

Note: Prior to installation of the agent, the bottom of the dialog below will not contain the success message.

 

Copy the command from the first box, and then SSH into the server and run the Easy Install script.

 

Fig. 3. Easy Install Script to Add AppOptics Agent to a Server

 

When the agent installs successfully, you should be presented with the following message on your terminal. The “Confirm successful installation” box on the AppOptics agent screen should look similar to the above, with a white on blue checkbox. You should also see “Agent connected.”

 

Fig. 4. Installing the AppOptics Agent on your Linux Instance

 

After installation, you can start configuring the dashboard for monitoring on the server. Click on the hostname link in the Infrastructure page, or navigate to the Dashboards page directly, and then select the Host Agent link to view the default dashboard provided by AppOptics.

 

Working with the Host Agent Dashboard

The default Host Agent Dashboard provided by AppOptics offers many of the metrics discussed earlier, related to the performance of the instance itself, and should look similar to the image below.

 

Fig. 6. Default Host Agent Dashboard

 

One common pattern is to create dashboards for each location you want to monitor. Let’s use “Datacenter01” for our example. Head to Dashboards and click the Create a New Dashboard button.

 

You can choose the type of monitoring display (Line, Stacked, and Big Number). Then you can choose what you want to monitor as CPU Percent, Swap, or Load. In addition, within the dashboard, you can select how long you want to monitor a group of equipment or set it to be monitored indefinitely.

 

Fig. 8. Custom Dashboard to View Linux System Metrics

 

Metrics – You can select existing metrics to create new composite metrics according to what you want to be monitored in the operating system.

 

Alerts – Alerts are created for the operating system, including time settings for issuing a new alert and the conditions for issuing alerts.

 

Integrations – You can add host agent plug-ins for support for application monitoring.

 

 

Conclusion

Monitoring your Linux servers is critical as they represent the basis of your infrastructure. You need to know immediately when there is a sudden change in CPU or memory usage that could affect the performance of your applications. AppOptics has a range of ready-made tools, customizable monitoring panels, and reports that are critical for investigating chronic infrastructure problems. Learn more about AppOptics infrastructure monitoring and try it today with a free 14-day trial.

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.