Monitoring Central Blogs

cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Monitoring Central Blogs

Level 7

It's Wednesday morning and it’s time to start working on this week’s CAB meeting action items. Looks like we have some routine changes coming up, which means digging into past changes, identifying relevant plans, looking into what worked well (and what hasn't) and reviewing their implementation, testing and rollback plans while repeating this process for every change...fun, right?

Read more...

Read more
2 0 45
Product Manager
Product Manager

Databases in virtualized environment are the majority of working databases. Read about the popularity of virtualized databases in 2020.

Read more...

Read more
5 4 461
Product Manager
Product Manager

We’re excited to introduce you to a solution designed to help you meet the process automation, scalability, and security requirements of your organization. Enter SolarWinds Service Desk Enterprise! This new, top-tier plan was designed to address the needs of large organizations with advanced IT service management (ITSM) practices.  

This new plan includes capabilities to address the management needs of complex IT infrastructure, scale delivery of IT services and support, and strengthen your overall service desk security posture. Service Desk Enterprise includes:

  • New CMDB Data Model and CMDB Visual Map supports richer and more advanced configurations while helping you navigate and understand complex configuration models.
  • Multi-Factor Authentication offers another layer of access control to your service desk.
  • Change Templates and Workflows drive consistency while accelerating change management.
Read more...

Read more
1 0 445
Product Manager
Product Manager

Many organizations already have or are moving their IT infrastructure to the cloud, as 61% of organizations are running their workloads in the cloud today, and this will increase to 85% of organizations by 2021.1  As a result, many customers are using Amazon Web Services (AWS), the world’s leading cloud computing platform which has 48% of the $32.4 Billion Infrastructure as a Service (IaaS) market.

While moving to the cloud improves elasticity and flexibility, it creates new management challenges, such as a lack of visibility into IT assets. Many IT organizations lose visibility into their assets when they move components like servers, storage, networking, databases, and other infrastructure to AWS. This makes it difficult to get a comprehensive picture of your IT environment, an understanding of your spending and risk levels, as well as the impact of your cloud infrastructure on the IT services you offer to employees.

To address these challenges, we’re excited to announce that SolarWinds Discovery now integrates with AWS to enable you to accurately discover, map, and manage your assets from AWS within SolarWinds Service Desk!  

Read more...

Read more
2 0 142
Product Manager
Product Manager

There are many benefits to monitoring critical web applications from the perspective of your end users’ experience. We’re not going to talk about them here. What we’ll cover in this post are the options SolarWinds provides to help you monitor their experiences AND help you determine which solution(s) could be most helpful for your organization. We’ll be looking at SolarWinds® Web Performance Monitor (WPM) and SolarWinds Pingdom®, and we’ll start with a high-level description.

Read more...

Read more
7 0 315
Level 8

We know service desk teams are absolutely swamped, here are a couple of things you can do to make your life easier.

Read more...

Read more
6 0 835
Product Manager
Product Manager

A centralized, up-to-date IT asset repository can be valuableIt provides insights into IT assets, helps reduce overspending on unnecessary infrastructure, identifies potential risks, and can even improve compliance with software purchases. Knowing what assets you have is also an important first step to building a Configuration Management Database (CMDB), which helps you understand how your underlying infrastructure supports IT service management (ITSM) and impacts the services you provide to employees. 

This is why we launched SolarWinds Discovery in 2019  to accurately discover, map, and manage your IT assets in SolarWinds® Service Desk. Today, we’re excited to announce that SolarWinds Discovery now integrates with the Orion® Platform, which enables you to consolidate and gain visibility into IT asset information from the Orion Platform within SolarWinds Service Desk  

Read more...

Read more
2 0 1,027
Level 9

The SCAR (See – Change – Audit – Request) framework is designed to help you gain control over user permissions across your IT infrastructure. The framework consists of four steps to enhance your overall security posture and audit permissions to your data and resources at any time—while making access rights management more sustainable, efficient, and transparent.

ARM_rich-client_082019.png

See Your Permissions Infrastructure

Do you know who has access to your most valuable data? A fully transparent access rights structure is crucial for meeting various compliance regulations and audit requirements. To help mitigate threats and reach compliance goals, you need to SEE who has access to specific elements within your systems, data, and files.

Typically, within minutes, SolarWinds® Access Rights Manager (ARM) allows you to generate a report on exactly who has access to what.

 

Report-Who has access where.png

Permissions can often prove complex: Admins leave; new admins rework old structures; employees join, move within the company, and leave. This can get overwhelming quickly, and with companies growing organically, access rights to certain resources, folders, and data are not always as transparent as they need to be.
With ARM, you can analyze your permissions structure based on a single user or resource. Typically, within minutes, you can gain a transparent overview on:

  • Nested group structures
  • Overprivileged users
  • Globally accessible directories
  • List members of different groups
  • Empty or recursive groups
  • Inactive accounts

Easily Change Permissions with ARM

ARM is designed to set new standards in the field of changing user permissions, also known as user provisioning. Not only does ARM allow you to change user permissions, group memberships, and other Active Directory parameters, it simplifies the process by enabling help desk and data owners to do so as well. ARM lets you set permission templates by department, helping increase consistency across the organization and streamlining the entire Joiner-Mover-Leaver process.

ARM-create account in AD-20194.PNG

To learn more about how ARM can help you change your user permissions to demonstrate compliance and gain efficiency, please read our whitepaper, Joiner, Mover, Leaver: User Provisioning with SolarWinds Access Rights Manager.

ARM can help you increase efficiency and demonstrate compliance via access rights delegation to data owners and help desk departments.

Audit Permissions Without the Hassle

We’ve already discussed how ARM allows you to fully SEE your permissions structure and CHANGE access rights (SolarWinds SCAR framework (See – Change – Audit – Request).

When it comes to auditing permissions and user access rights, board tools and scripts often reach their limits. ARM is designed to help you AUDIT your permissions infrastructure anytime and with minimal effort. Whether your audit process is driven by internal requirements from mandates like GDPR, PCI DSS, HIPAA, or SOX (or all four), detailed reporting is critical to demonstrate compliance.

ARM-report-OU members and group memberships-20194.PNG

Read our Top 7 Audit Prep Reports whitepaper to discover what each of the seven reports provides, and how ARM can help you overcome common challenges in the audit process.

These seven reports allow you to audit:

  • User and group access
  • Overprivileged accounts
  • Risky group configurations (empty or recursive groups)
  • Inactive and temporary accounts
  • Insecure account configurations
  • Permissions differences monitoring
  • Historical AD structure

Analyzing Active Directory permissions can quickly become overwhelming. ARM helps IT teams quickly analyze authorizations and access permissions, helping reduce the risk of failed audits and stolen data.

Access Requests Made Simple

We’ve covered how the SolarWinds SCAR framework (See – Change – Audit – Request) and ARM allow you to SEE your permission structure, efficiently CHANGE access rights, and easily AUDIT user access rights.

With SolarWinds Access Rights Manager (ARM), employees can also REQUEST access rights directly from the data owner. Data owners, most often team leads or department heads, know best who should have access to what—and why. With ARM, you can have data owners directly handle access requests from employees with a web-based, self-service permissions portal.

If you already benefit from Microsoft enhanced management and remote controls, ARM can add monitoring, auditing, provisioning, and process optimization.

Want to learn more about ARM? Download a free trial here.

Read more
2 0 448
Product Manager
Product Manager

Background

 

Fire-fighting mode for DBAs can be stressful when they have co-workers and managers breathing down their necks due to application slow-downs and/or outages. Logic says something changed, but what? In a worst-case scenario, the database instance itself looks fine, nothing changed within the database and the SQL being executed was running fine before. Of course, the SysAdmin says nothing is wrong with the physical server or storage which makes it even more questionable. Hmm, could you be running in a virtual machine (VM)? Is your VM resource starved and competing with other VMs?

 

According to Gartner’s Market Guide for Server Virtualization[1], “Hypervisor-based server virtualization is now mature, with 80% to 90% of server workloads running in a virtual machine (VM) for most midsize to large enterprises.” Additionally, anecdotal evidence states 70% of all databases are virtualized. In fact, here at SolarWinds, 50% of our database instances run in a VM. For all the benefits of virtualization like cost savings and ease of migrating workloads, the abstraction of the virtual layer from the physical hardware can introduce some challenges.

 

And let’s not forget the elephant in the room, snapshots. Many DBAs I’ve talked to are at a loss as to why SysAdmins and IT ops perform snapshots of their database instance VMs, which in turn can cause performance issues, especially if a memory snapshot is invoked which renders the VM inactive while the memory is written to disk. Database backups are best left to DBAs who ensure referential integrity is maintained to recover a database.

 

Which Metrics Matter?

 

If you find yourself running your database instances in a VMware VM, what do you need to look for to see if the VM your database is running in has problems? There are many metrics available, so let’s review the usual suspects.

 

CPU Ready

 

  • This metric indicates the VM (and the database trying to run inside it) was ready to run but instead sat idle waiting behind other VMs contending to control the same shared resources such as physical CPUs or memory.

    For example, a vSphere host has six physical CPUs, and two VMs are configured to each require four virtual CPUs (vCPUs) before they can run. This situation means only one VM can run at a time. You can eliminate the VMs queueing behind each other by either moving a VM to another host or configuring both VMs to require three or fewer virtual CPUs.

 

    • The term “oversubscription” simply means you’ve assigned more virtual resources than what physical resources exist to run all VMs concurrently. It may seem a bit strange but reducing the number of vCPUs may dramatically increase its performance. Generally, oversubscription should not go above 5%.With the SolarWinds®Database Performance Analyzer (DPA) VM Option, an easy way to see how many physical CPUs your host server has is to view the Host tab on the VM CONFIG page.

 

pastedImage_0.png

VM CPU Usage

  • Actively used CPU as a percent of total available virtual CPU in the virtual machine.

Host CPU Usage

 

  • Actively used CPU as a percent of total available CPU on the machine. If this number is high you might see VMs with high CPU ready and/or co-stop.
    • Active CPU is approximately equal to the ratio of the used CPU to the available CPU where: Available CPU = # of physical CPUs x clock rate.
    • When your database instance is running in a VM, with the VM Option, DPA automatically expands the data in the CPU tab to include this information along with other VM specific metrics.

 

pastedImage_5.png

Co-Stop

 

  • The time a VM waits for a vCPU is due to scheduling (lack of resources). So basically, your VM can be waiting on physical CPU resources in use by other VMs. If you see high Host CPU Usage this is probably a sign there are too many VMs on this host and/or you need more physical CPU resources.

 

VM Memory Swap Rate

 

  • The “swap in” and “swap out” rates generally mean you have a shortage of physical memory on the host, so the memory is swapped out and in from disk.

 

VM Active Memory Usage

  • This is the memory in use as a percent of the memory configured for the VM.

 

Host Memory Usage

  • This is the memory usage on the host (consumed memory / total machine memory). If this is high (e.g., GT 90%) this could indicate host memory over-commit which could lead to high VM swap rates.

VM Memory Overhead

  • This is simply the amount of memory used to run the VM. Over-configuring memory (or excess vCPU for that matter) will unnecessarily increase overhead. That said, there’s memory needed by ESXi itself and the virtual machine (virtual machine frame buffer).

 

VM Memory Balloon

  • The balloon driver reclaims pages on the server considered less valuable. The crux of this VMware proprietary technique is to match the behavior of a guest OS. You should only see this when the host is running low or out of physical memory.
  • If you see the virtual machine your database instance is running in has a certain percent of memory claimed by the balloon driver, look for memory swapping which could affect your VM’s performance. However, if you don’t see any swapping issues you don’t and won’t necessarily have a performance problem.

 

VM Disk Commands

  • Number of disk commands executed is an indication of how busy the disks are. That said, unless you see large queues developing and commands start to be aborted there isn’t a problem.
  • If you see aborted disk commands, then your storage is severely overloaded and can lead to serious application response issues.

 

VM Disk Usage

  • Available if you aren’t using a NFS datastore, it will show the average disk I/O rates across all virtual disks on the VM.

 

VM Read / Write Rates

  • VM disk read rate is the average amount of data read from the disk each second during the collection interval. For a VM, this is the rate at which data is read from each virtual disk to the virtual machine.
  • VM disk write rate is the average amount of data written to disk each second during the collection interval—simply the rate data is written to each virtual disk on the VM.

 

Host Disk Device Read / Write Rates

  • The host disk read-and-write rate is the average read/write rate across all disks/LUNs on the host. The rate represents the read/write throughput at the host level across all disks/LUNs and VMs running on the host.
    • If the database instance has I/O performance issues, you may have another VM on the same host causing the delays. Compare this metric to the physical I/O rate from the database instance. If the Host rate is higher, then it’s likely another VM is the problem. Otherwise, the VM your instance is running in may be causing too much of a demand on the underlying physical storage.

Host Max Disk Latency

  • This is the highest latency value across all disks used by this host.

 

Host Disk Latency

  • Read latency is the average amount of time to process a read command to a disk to the host (across all VMs). High disk latency indicates storage may be slow or overloaded.
  • Write latency is similar to read and is the average amount of time to process a write command from the specific disk across all VMs.
    • Disk Write Latency = Kernel Write Latency + Device Write Latency
  • Expected disk latencies will depend on the nature of the storage like read/write mix, randomness and I/O size along with the capability of the storage subsystem.

 

In addition to these metrics being found in DPA, you can execute the “esxtop” command from your VMware ESXi host or look at various utilization metrics from the VMware ESXi console. SolarWinds Virtualization Manager also reports on all of these metrics and more in a friendlier format with both historical and real-time data.

 

esxtop screen capture.pngESXi Screen 2.PNG

 

 

Sample Nightmare Scenario Avoided

 

As I mentioned when I started off, a nightmare scenario could be when everything associated with the database instance seems fine—nothing changed. Since we’ve covered the essential VM metrics you should be monitoring, let’s walk through a hard-to-find problem for a database instance running in a VM using SolarWinds Database Performance Analyzer (DPA) with the VM Option. In the 2019.4 release of DPA, we expanded the VM option to go beyond the basic resource metrics to include additional HOST metrics and to make note of events, as seen in the DPA CPU tab in RESOURCES.

 

pastedImage_10.png

* Example of event logging in DPA 2019.4

 

 

Let’s walk through our sample “nightmare” scenario.

 

  • Problem ticket open for poor application performance response time
    • Users complained the morning of Monday, December 2 “around 8 a.m.” they experienced abnormally long wait times.

 

  • No outages were recorded from the IT Ops group

 

  • You go to DPA to look at the Database instance supporting the application
    • You notice a longer than normal wait occurrence on December 2, and the machine learning anomaly detection flags this time as a critical wait time delta from what is normally expected at this time of day.

pastedImage_5.png

 

 

  • You then look at the tab ADVISORS for additional data for this day.

    1. As it turns out, a specific query accounts for the top amount of execution time.
      pastedImage_12.png

  • You select this query to find out more about it and what occurred at the time. From the QUERY DETAIL page, you see the longest wait time was for memory/CPU from which you click on the green bar for memory/CPU to explore further by going down to the hour.

    pastedImage_13.png

 

  • Once you get down to the hourly view, you see a noticeable spike in wait time in the morning hours when the application response time issue was occurring.
  • As you scroll down the page to the end where VM metrics are shown, you see the new co-stop metric where there’s a corresponding spike. By hovering over the annotation dots, you see during this time the VM was being moved via vMotion from one host to another.


pastedImage_14.png

pastedImage_16.png

  • Just as with snapshots, vMotion events can have a negative impact on the performance of the VM the database instance is running in. Without visibility into the virtualized infrastructure, it can be time consuming to find the culprit of poor performance. 
    With DPA, you can easily line up all of resources for a specific time to pinpoint the problem as seen below.

    pastedImage_17.png

 

Summary

 

With VMware’s 500,000 customers and tens of millions of VMs, virtualization is here to stay. Since many database on-premises to cloud migrations involve virtualization, e.g., Azure VM, many of the same challenges existing on-premises will exist in IaaS environments. DBA’s don’t have to be virtual admins, but they do need to be aware of the environment their database instances run in and the impact those environments have on database performance.

 

That said, I’ve discovered many DPA customers have no idea there’s a purpose-built option for VMware that can be added to the product. It’s easy to see if you have the option by looking for the VIRTUALIZATION tab on the home page.

 

pastedImage_18.png

*  This all-in-one view lets you line up all your resources in a single view to look for problems on a specific date and time.

 

Our goal at SolarWinds is to listen to our customers which is why we’ve enhanced the VM option for DPA. If you are a DPA customer, be sure to utilize our THWACK® feature request page to request and vote on feature enhancements. 

 

 

 

 


[1] Gartner Market Guide for Server Virtualization, Published 24 April 2019, ID G00350674

Read more
2 11 1,975
Product Manager
Product Manager

Over the last decade, cybercriminals have gained the necessary resources to make it easier and more lucrative for them to attack small-to-medium-sized businesses. The 2019 Cost of a Data Breach Report not only shows the odds of experiencing a data breach have gone up by a third in less than a decade, but the cost of these data breaches is also on the rise. Additionally, small businesses face disproportionately larger costs than their enterprise counterparts when an attack is successful. This report highlights the importance of SMBs being prepared, now more than ever, to quickly identify and respond to potential cyberattacks.

One common way businesses increase their security posture is by implementing, and using, a Security Information and Event Management tool—SIEM for short. A SIEM solution at its core, aggregates and normalizes log and event data from across an entire network making it easier to identify and respond to attacks, compromised data, and security threats.

However, many SMBs feel a SIEM solution is out of reach for their organizations for three main reasons:

  1. Complexity
    The complexity starts right away with most traditional SIEM vendors. Connecting different log sources often requires building parsers or writing (and possibly learning) RegEx to ingest and normalize log data. Once the data has been consolidated, recalling the data adds another layer of complexity. For example, wanting to see logins from a particular user can require writing a query in language created specifically for their SIEM. Additionally, feature bloat often makes it difficult to know how to find answers to simple questions.

  2. Expertise Requirements
    A SIEM is only as effective as the rules put in place to identify, alert on, and respond to potential threats. Without a deep understanding of the types of activities captured by logs, and the behaviors indicating malicious or risky behaviors, setting up the rules can be daunting. Especially if the SIEM doesn’t come with any pre-built rules. With limited time, and a scarcity of available security professionals, setting up a SIEM can seem like too big of a project to take on

  3. Expense
    Aggregating all log and event data in one place is ideal. However, the licensing models of many SIEM solutions can quickly price out SMBs. Many of the most common SIEM solutions on the market are SaaS products. The price changes based on log volume being sent to the product. This leads to two main problems, pricing being unpredictable and/or IT pros needing to cherry pick which logs they will collect and store…hope you pick the right ones.

At SolarWinds we understand how important it is for IT pros at SMBs to gain valuable time back and automate as much as possible—including threat detection and response. That’s why we built Security Event Manager (SEM). It’s a SIEM solution built for resource-constrained IT pros needing to advance their organization’s security beyond patching, backups, and firewall configurations. SEM is designed to provide the most essential functions of a SIEM to help improve security posture, more easily meet compliance requirements, and reduce the time and complexity of an audit.

How Does SolarWinds Security Event Manager Differ From Other SIEM Products?

  1. Easy to Deploy and Use
    Deployment is flexible via virtual appliance potentially located on-premises or in the public cloud (such as Azure or AWS). Many users report SEM is up and running within fifteen minutes, no professional services required. Log collection and normalization is done by either enabling one or more of the hundreds of pre-built connectors and sending logs to SEM or by deploying the SEM agent.

    It has a simple and clean UI, focused on the features SMBs find most important. Such as the dashboard to help visualize important trends and patterns in log and event data:
    pastedImage_14.png
    As well as a quick and easy keyword search providing faster log recall without the need to learn specialized query languages:
    pastedImage_15.png

  2. Provides Expertise and Value Out of the Box
    Finding value with the tool will not be an issue. An integrated threat intelligence feed and hundreds of pre-defined filters, rules, and responses, not only make it faster and easier for users to identify threats, but also automate notifications or corrective actions.
    pastedImage_16.png
    Beyond identifying and responding to threats, the pre-built reports make demonstrating compliance a breeze.
    pastedImage_17.png
    The best part is users aren’t confined to out-of-the-box content. As their organizations needs change and grow, or as they become even better acquainted with the tool, the pre-defined content, visualizations, and reports are flexible.

  3. Priced With SMBs in Mind
    SolarWinds® Security Event Manager has a simple licensing model. SEM is licensed by the number of log-emitting sources sent to the tool. No need to pick and choose which logs to send, and no need to worry about a large influx of logs breaking your budget. Users get all the features of SEM and industry leading support for a single price. The pricing model is built to scale with the user’s environment, the price per node dropping at higher tiers. For those looking to monitor workstations, infrastructure, and applications, special discounted pricing is available. Same deal, one price for all features, for each workstation.

If you’re an IT pro at an SMB looking to get a better handle on cyber security or compliance reporting, give SEM a shot. You can download a free, 30-day trial here.

Read more
6 11 1,417
Product Manager
Product Manager

Background

This blog initially started out as an examination of how SolarWinds uses Database Performance Analyzer (DPA) within our own production environment. It now includes not only how our DBA uses DPA, but how other business units within SolarWinds use it and why. It isn’t surprising to find people in IT operations and application development using DPA, since our own customer studies have shown a high number of people outside of the DBA role use it, too.

Recent product-specific studies for DPA showed a high number of DevOps/IT Ops and AppDev roles using the product and an eye-opening, broad customer census exposed even more. In the 2019 THWACK Member Census, we asked over 2,200 IT professionals to select their primary job role and only 2.4% selected DBA. Interestingly, when we asked respondents if they managed or monitored databases, 42.7% said yes.

This brings to light the discussion of the “accidental DBA” and some interesting changes in IT organizations. First is the growth in number of DevOps people who handle database-related tasks. Second is the importance of databases as the platform for most mission-critical applications and why everyone has a keen interest in their availability and performance. And last, but not least, the number of DBAs is going down according to Computer Economics, who has seen the percentage of DBAs relative to total IT staff drop to 2.8% in 2017 from 3.3% in 2013. Our own Head Geek, Thomas LaRock, wrote an article pointing out the number of DBA jobs has stagnated for almost 20 years. On the flipside, Gartner pointed out that DBMS (Database Management Systems) revenue grew an astounding 18.4% to $46 billion in 2018.

Armed with this information, I decided I’d investigate the SolarWinds DBA team and see if any of these trends held true.

Let’s Start With the DBA

As I mentioned, I initially thought I’d interview the DBA team here at SolarWinds to see how we “drink our own champagne,” since I knew DPA was used by our internal IT team. As it turns out, the “DBA Team” is one person. I guess for a company that did $833 million in revenue in 2018 I expected an entire DBA organization, not just one hardworking DBA. But maybe this isn’t the exception?

I learned a lot from our DBA about how she can keep track of over 250 Microsoft SQL Server databases running on a mix of physical and virtual machines. My biggest takeaway from talking to her was that DBA’s don’t “monitor databases.” They want to be alerted when there are problems and they need a product to help them quickly find and resolve problems when they arise. They also want a product to help them optimize their databases proactively.

The first thing we discussed was “what’s important and who is it important to?” Here are the top things SolarWinds uses DPA for and the primary users:

  • - Overall database health: DBA and IT Ops
  • - Debugging after deployment: AppDev and DBA
  • - Ad-hoc trouble shooting: DBA and AppDev
  • - Capacity planning: DBA

After I learned about the overall database environment (250+ SQL Server databases), I wanted to understand specific, real-world use cases of DPA in action.

DBA Usage Scenarios

So how does the DBA at SolarWinds use DPA? First, she sets up alerts, so she can immediately be sent text notifications from DPA if something goes awry. DPA has had alert notification for a while, but the 2019.4 release made it even easier via a “drag and drop” interface, making alert customization simple. Second, DPA is the first place she goes to when she gets notified about something going wrong, whether it’s an alert, phone call, email, or a help desk ticket opened and assigned to her.

Scenario 1 of 2

In this first real-life scenario, our DBA was alerted to an “assertion check fail” pointing to possible corruption. The SQL Server instance itself created a hard-to-decipher stack dump and the only noticeable thing she could pick out of was the process ID.

With this in hand, she went into DPA to the specific time the event occurred in the SQL Server instance. Since DPA provides both real-time and historical data, she was able to drill down to find 1) the session ID executing this query, and 2) the SQL script running and the database. After speaking with the developer who ran the query, she determined it was a problem with SQL Server itself and asked the developer to refrain from running the query until they got the problem resolved by Microsoft.

pastedImage_0.png

*Screenshot the SolarWinds DBA used to find the culprit of the stack dump SQL Server generated.

Scenario 2 of 2

This second use case brings to light how important DPA is for establishing the overall health of a database and for capacity planning. Our DBA could not stress enough how important it was for her to know the baseline of a database instance and associated queries. From the baselines DPA develops, with the help of machine learning, she can know what a typical day looks like and the behavior of typical database activity. This allows her to spot both anomalies and trends.

Regarding capacity planning, she uses DPA to monitor the utilization and performance of applications and make note of trends she uses for future capacity requirements such as new or additional servers. Luckily, SolarWinds does a quarterly two-week freeze on new applications and changes, and this two-week period gives her a chance to go through DPA reports and proactively tune the environment. DPA’s anomaly detection powered by machine learning is a great way to graphically see the biggest opportunities for proactive optimization.

pastedImage_1.png

*This resource tab in DPA is a favorite of our DBA because it gives her a good overview of server resources being used.

Our DBA believes DPA will be even more useful as SolarWinds starts to migrate databases to Azure PaaS. As she stated, being on top of performance issues like poorly written SQL and poor performing tables doesn’t go away, and the cost of making mistakes, especially those consuming resources, can lead to spikes in usage charges.

Application Development and DPA

As I mentioned at the beginning, I learned a lot about how DPA is used at SolarWinds and the various people and departments using it. The application development (AppDev) team is one of the bigger teams in need of the data DPA provides. Why? Because they, along with our DBA, are constantly deploying changes and want to see the difference.

For example, is the SQL query running slower or faster than before? As previously mentioned, some people are “accidental DBAs,” so if the query they implemented ran fine on a QA instance but in production performs poorly, they need to know why. Case in point, this exact scenario happened recently and was due to a missing index DPA quickly pointed out. As our DBA stressed, for someone not very experienced with index recommendations, the tuning advisors in DPA can be a life-saver.

pastedImage_7.png

*One of the most popular DPA pages used for before and after is also the one used to look at overall waits and is great for seeing changes in before and after performance.

Finally, IT Operations

At SolarWinds, IT Operations (IT Ops) is where the buck stops for overall system availably, and just like our DBA, they make extensive use of alerts. Depending on the alert, they may send a priority 3 email when something has reached a certain threshold. But if SQL Server were down, they would send an email as well as page Opsgenie, which then goes to the primary person on call and posts a message on Microsoft Teams. The IT Ops group also has certain alerts integrated with SolarWinds® Service Desk to automatically open tickets.

But what about databases and their health…does IT Ops care? The answer is yes because they rely on the DPA integration with SolarWinds Server & Application Monitor (SAM) to find the cause of performance issues on servers or when someone complains about application performance. Since DPA and SAM integrate with the SolarWinds Orion® Platform, you can navigate seamlessly between the products.

For example, they used the SAM integration to track a CPU spike on a server to a SQL Server database instance in a critical state. In this case, they immediately reached out to the SolarWinds DBA because they could tell the issue with the server was related to the database. However, if the DBA is unavailable, they rely on the suggestions and recommendations in DPA to diagnose the problem and take action or provide further documentation for either our DBA or AppDev.

Just as DBA and AppDev look for signs of abnormality, IT Ops looks at historical trends to find issues that may correlate to database issues. The integration of SAM and DPA makes this simple.

pastedImage_8.png

*IT Ops uses this page in Server & Application Monitor to see trends and then drill down and isolate the root cause. SAM’s integration with DPA makes this simple.

Summary

As stated in the introduction, the role of the DBA is changing and many people without a DBA title are involved with the performance of database applications. With the movement of database instances to IaaS and PaaS implementations, the ability to optimize, find, and resolve performance issues doesn’t go away. In some ways it becomes more important due to the potential impact on OpEx (aka your monthly Azure bill).

Read more
4 6 2,165
Product Manager
Product Manager

Change control. In theory it works. However, there’s always one person who thinks the process doesn’t apply to them. Their justification for going rogue may sound something like, “There’s no time to wait, this has to be done now,” and, “This is a small change, it won’t impact anything else,” or maybe, “This change will make things better.”

But at the end of the day, those changes inevitably end up crashing a service, slowing application performance, or even worse, opening new vulnerabilities. The call will come in, something’s broken and magically no one will know why on earth it’s happening and, they certainly won’t be able to remember if any changes occurred…or who made a change. There goes the rest of your day, looking for the root cause of an issue created by one of your own coworkers.

Recently, Head Geeks Thomas LaRock sqlrockstar and Leon Adato adatole hosted a THWACKcamp session on this exact topic. In their scenario the culprit was “Brad the DBA.” At SolarWinds, we understand this all-too-common scenario and have a tool designed to help.

SolarWinds® Server Configuration Monitor (SCM) provides an easy-to-use and affordable way to track when server or application configuration changes are being made, who’s making the changes, and what the differences are between the old configuration and the new configuration. It detects, tracks, and alerts on changes to things like hardware, software, operating systems, text and binary files, Window Registry, and script outputs on Windows® and Linux® servers.

pastedImage_0.png

Additionally, SCM is an Orion Platform-based module, meaning you can quickly correlate configuration changes with infrastructure and application performance metrics in a single view. Helping confirm or illuminate the possibility of a configuration change being the culprit.

pastedImage_1.png

These capabilities help provide you with the visibility needed to not only remediate issues faster but, also hold non-process-abiding team members accountable for their actions. If you’re tired of the shenanigans created by your colleagues not following the change control process for your servers and applications, check out a free, 30-day trial of Server Configuration Monitor. And just for fun, if you have a good story of how “Brad” broke your day, feel free to share below!

Read more
4 7 1,112
Product Manager
Product Manager

PASS Summit 2019 is here and SolarWinds will be at the conference, booth #416, November 5 – 8 in Seattle, Washington.

We’ll be showcasing the latest release of SolarWinds® Database Performance Analyzer (DPA), including our just announced support for Azure® SQL Managed Database Instance and SQL Server® 2019. No matter if you’re currently using DPA for your cross-platform database performance monitoring, or if you’re a “casual DBA,” stop by our booth for a demo to see the great new features we’ve added to this release.

And if you’re currently using any other SolarWinds products capable of integrating with the Orion® Platform, such as Server and Application Manager (SAM) or Virtualization Manager (VMAN), ask us how DPA seamlessly integrates with other products to give you a complete end-to-end view of your database applications.

Lastly, we’ve got two SolarWinds-sponsored events at the conference you should put on your calendar. One is the first timer’s reception we’re sponsoring Tuesday, November 5, from 4:45 – 6 p.m. in ballroom 6E at the convention center. The second is our presentation “SQL Server Performance Analysis Powered by Machine Learning,” Wednesday, November 6, at 1:30 p.m. in room 618 at the convention center.

Stop by and say hi. In addition to product demos, we’ll be giving away some cool swag.

Read more
0 0 185
Level 9

After all the education, all the measures, there are two almost inevitable truths in IT security: Some users will reuse their passwords and data breaches will happen—leading to exposed and misused credentials.

These two facts combined spell trouble. Account takeover kind of trouble.

Your own password strategy is likely solid. You use strong, unique passwords, enabled multi-factor authentication whenever you could—modern-day table stakes, right?

Do you think everyone in your organization is as careful? You never know whether their Outlook or Salesforce password isn’t the same one they used to sign up for a Smash Mouth fan forum running on vBulletin, last updated in 2011.

We’re not even talking about your organization when it comes to data breaches—the IT pros do their best to ensure breaches won’t happen. But that vBulletin forum is easy pickings. Once breached, the hacker has half of what they need—the password. Even serious services and organizations are regularly breached and pilfered.

Mix these two together, add some credential stuffing tools, and you have an account takeover—using legitimate, but stolen, credentials to access your systems and data.

What to do about it? Other than to think “We’re not an interesting-enough target” (false) or it’s not something to worry about (false).

We’re excited to have a new tool to help you get some power back to your hands: SolarWinds® Identity Monitor. We’ve partnered up with security experts at SpyCloud to give you a chance to act before the bad actors do.

In true SolarWinds fashion, Identity Monitor is an easy-to-use product:

You sign up with your business email address, add any other company email domains to a watchlist and then get notified whenever any of the emails and credentials belonging to your organization have appeared in a data breach. You also have access extensive past breach history for your monitored domains.

With an early notification, you can act and protect your accounts from account takeover attempts, for example forcing a password reset.

Ready to see where you stand? Start with a free check of your email breach exposure and see the stats of how many personal records in your domain were exposed and when. Fingers crossed!

Read more
4 0 449
Level 8

It sounds easy enough to an IT pro: is it an applications ticket or a hardware ticket?

Simple enough. Why is this one question so important? For starters, it’s how you track the performance and success of the teams who provide internal support. In addition, collecting simple data points like “category” and “subcategory” can drive a better, faster experience for all the employees in your organization.

The problem is, the sales manager (or accountant, or creative designer) doesn’t know the difference between an application support issue and hardware breakdown, and they’re not familiar with IT’s processes that rely on such information.

That’s where artificial intelligence (AI) can help. SolarWinds® Service Desk uses AI in a few different ways– suggested data points for tickets was the first AI-powered functionality we introduced. Suggested categories and subcategories provide users some direction based on keywords within the subject or description of their ticket and the composite history of ticket data in the system. The goal is for requesters, regardless of their tech understanding, to enter complete and accurate ticket data thanks to those suggestions.

This data can drive automated ticket routing and priority. It can empower your teams to carve out unique SLAs for specific types of tickets (and trust the tickets are categorized correctly). It can make the difference between granular, accurate performance data and, well, this:

AI1.png

This might look familiar: the dreaded “other” category. When users (or agents, for that matter) don’t know how to identify the category, your reports will look something like this. It’s time to say goodbye to this useless chart. AI will see it to the door by suggesting the correct data points up front.

Let’s look at some use cases for AI in action.

Powering Ticket Automations

One of the most important sections in the configuration of a SolarWinds Service Desk environment is the automations engine. This where you’ll identify types of tickets you can route directly to a certain group, keywords indicating high priority, or breaches requiring specific escalation processes.

Those automated actions depend on data collection when a ticket is entered. The information the user enters will correspond directly with an action, so it needs to be correct for an automation rule to work.

This is where AI can help. As you can see from the next example, there are suggested categories and subcategories as soon as they click those required drop-downs. The suggestions are based on the information they’ve already entered combined with historical data from the service desk environment. When they choose “hardware” and “laptop” with the help of those AI-powered suggestions, the custom fields appear for “battery type” and “type of laptop.”

AI2.gif

Why is this important?

You can create an automation rule to route these tickets directly to the appropriate support group. The AI-powered suggestion unlocks those custom fields to help you pinpoint the exact nature of the issue.

In the example below, you’ll see an automation rule to route Mac device issues directly to the “Mac Device Technical Support” group.

AI3.png

AI4.png

With the help of AI-powered suggestions, you’ll receive the crucial piece of information driving the automation rule. Now these tickets will skip the general queue and arrive instantly with the “Mac Device Technical Support” group. You’ve saved the requester time waiting for a resolution and your IT team time parsing through a general ticket queue.

Self-Service and Suggested Request Forms

Requesters may not realize the benefit of the suggested categories because they’re unfamiliar with how your teams use the data. But in this next example of AI in the service desk, the benefits will be plainly evident.

This is where your service catalog and knowledge base reach their maximum potential. For a long time, it was very difficult for IT to encourage users to leverage self-service articles or request forms driving workflows. Simply put, some users will always default to creating a ticket, no matter what resources might be available through the portal.

When IT replies to the ticket with a link to a request form, the user now needs to complete the form after already submitting a ticket. That’s a poor experience, and it’s time wasted on both ends.

To simplify the experience, you can meet them with suggested resources wherever they are in the portal. If they like to use the search bar, suggested service catalog forms and self-service articles will appear. If a requester is the “always submit a ticket” type, AI-powered suggestions will pop up with request forms or knowledge articles as they fill out the subject line of the ticket.

Not only are you anticipating the service they need, but you’re giving them every single opportunity to leverage the resources your team has made available.

AI5.gif

So, for now, there are three major benefits AI has brought to the service desk:

1) Complete and accurate data collection to drive automation and reporting

2) Access to appropriate request forms, driving automated workflows

3) Opportunities to self-resolve

As this technology grows, so will the possibilities for proactive measures to save time and avoid disruption to employees who depend on the technology you support.

Read more
1 3 1,196
Level 9

In the past, the importance of access rights management had to wait in line behind trending topics like hybrid infrastructures, digitalization, cloud, and the latest new tools the C-level wants to have and implement. As a result, access rights management in companies often lacks transparency, is organically grown, and doesn’t follow best practices like the principle of least privilege.

Even though managing user access rights is an essential part of every administrator’s work, there are different ways of doing it. However, looking at all the systems, tools, and scripts out there, most admins share the same big pain points.

Earlier this year, we asked our THWACK® community about their biggest pain points when it comes to access rights management and auditing. Turn out the biggest factors are:

pastedImage_1.png

  1. Moving, adding, or changing permissions
  2. Running an audit/proving compliance
  3. Understanding recursive group memberships

1.      Moving, Adding, or Changing Permissions

The flexibility of today’s working world requires a well thought-out user provisioning process. Whether for a new user, a short-term assignment, department changes, or temporary projects, the expectations of an IT group are to accurately and quickly provision users while helping to maintain data security.

IT departments are typically responsible for securing a network, managing access to resources, and keeping an overview of permissions and access rights policies. Therefore, they should use a provisioning framework. SolarWinds® Access Rights Manager (ARM) is designed to help address the user provisioning process across three phases—joiners, movers, and leavers.

SolarWinds Access Rights Manager not only helps automate the joiner or initial provisioning phase, it also allows admins to quickly perform changes and remediate access rights while enabling data owners.

Creating and Moving User Access Permissions

With ARM, you can control the creation of new user accounts, rights management, and account details editing.

Its user provisioning tool allows you to set up new users typically within seconds. Users are generated in a standardized manner and in conformity with the roles in your company. The access rights to file servers, SharePoint sites, and Exchange as defined in the AD groups are issued at the same time. ARM generates a suitable email account so the new colleague can start work immediately. You can schedule the activation to prepare for the event in the future or to limit the access period for project work. Whether help desk or data owner, participants work with a reduced, simple interface in both cases.

All access rights are set up in a few steps.

On the start screen under “User Provisioning,” you can choose from the most important quick links for:

  • Creating a user or a group
  • Editing group memberships
  • Editing access rights for resources

pastedImage_3.png

By choosing “Create new user or group,” ARM allows you to create a user or group based on preset templates. These user and group templates have to be created individually one time after installing ARM.

pastedImage_12.png

pastedImage_17.png

For further information please download our whitepaper: Joiner, Mover, Leaver: User Provisioning With SolarWinds Access Rights Manager

2.      Running an Audit/Proving Compliance

pastedImage_22.png

With ARM, you can either create reports on users/groups or resources along with further filters.

Just looking at reports for Active Directory, you could create views for:

  • Where user and groups have access
  • Employees of manager
  • Display user account details
  • Find inactive accounts
  • OU members and group memberships
  • User and group report
  • Identify local accounts
  • And many more

While creating a report, you can set different selections such as the users or groups you’d like to report on and the resources you would like details about.

pastedImage_33.png

Additionally, you can set up scheduled reports, which you can send directly as email to yourself, your auditor, or direct line if needed.

pastedImage_38.png

To gain more insight on the reporting capabilities of ARM, please see our whitepaper: Top 7 Audit-Prep Reports

3.      Understanding Recursive Group Memberships

Groups can be members of other groups. Active Directory allows "children" to become "parents" within their own family tree. If the nested group structure loops in a circular way, group membership assignments become ineffective and nonsensical. Through these recursions or circular nested groups, every user who is a member of any of the recursive groups is granted all the access rights of all the groups. The consequence is a confusing mess of excessive access rights.

ARM automatically identifies all recursions in your system. We highly recommend removing the recursion by breaking the chain of circular group memberships.

pastedImage_43.png

ARM not only allows you to see circular or recursive groups, but directly correct group memberships and dissolve recursions.

To keep an eye on the most common access-based risk levels, ARM provides a risk assessment dashboard with the eight biggest risk factors and lets you correct your individual risk levels right away.

pastedImage_48.png

Get your free ARM trial and do your risk assessment here.

Read more
3 0 863
Level 10

We are happy to announce the arrival of the brand-new SolarWinds NAT Lookup free tool.

We know how frustrating it can be when your users can’t even get through the firewall to their own network. SolarWinds NAT Lookup helps simplify the network address translation lookup process to help get your users beyond their firewall translation issues, prevent overlapped policies that cause incorrect translations, and effectively troubleshoot address issues. 

So, what exactly can you do with this new addition to the SolarWinds free tool catalog?

  1. Search an IP address across one or multiple Palo Alto firewalls.

        Quickly perform an IP lookup on one or multiple Palo Alto firewalls to verify IP translation information.

     2. Troubleshoot and verify NAT policy configuration

         Render complete lists of all NAT policies that apply to a searched address to spot overlapping policies, cross-reference the policy configuration to live session traffic, and see the                order of overlapping NAT policies.

     3. See live session traffic per firewall for the translated address

         Gain insight into live session traffic per firewall for each translated address to help ensure the policy configuration matches observed behavior and performance.

     4. Export information for records keeping

         Keep historical records of each executed search by exporting the policy configurations from the tool into CSV file format.

Additional Features:

  • Removes the need for users to have direct access to firewalls
  • Easy distribution to other IT groups instead of granting direct access to your sensitive firewalls Helps users identify dedicated translated addresses

How do you plan to use SolarWinds NAT Lookup? We’d love to hear from you, so feel free to post your use cases in the comments below.

If you’d like to get the nitty-gritty, in-depth info about NAT Lookup and how you can make the most of it, check out this article

Read more
1 8 2,216
Level 9

Synthetic user monitoring is a technique that simulates user transactions—or common paths on websites—so the administrator can watch for performance issues. These transactions are meant to represent how a user might be experiencing the site. For instance, is a potential customer getting an error when they add an item to their cart? Is a specific page loading slowly or not loading at all? These are things that can affect your bottom line and result in unplanned fire drills.

Synthetic user monitoring should not be confused with Real User Monitoring. Real User Monitoring captures and analyzes transactions from real users on a site. It helps understand load times for your pages from browsers in their actual locations.

These approaches provide different perspectives on web performance. Each have their benefits, but today—in honor of the release of Web Performance Monitor 3.0—we’re going to focus on situations when synthetic user monitoring is a good choice.

Find Performance Issues Before They Cause Problems for Your Users

IT infrastructure monitoring tools are great at telling you if a server or a service is up or down, but users might still be frustrated even if these things look OK. Synthetic user experience monitoring tools let you see if an overall transaction is working (can a user purchase something from your site?) or if a certain step is having trouble (when I click “buy” my payment processing is hanging). Once you’re alerted, you can go into troubleshooting mode with the specifics of what your users are seeing to minimize the impact. Plus, you can continuously run these tests from multiple locations to ensure things are working where your users are. 

Benchmark Your Site’s Performance to Identify Areas for Improvement

As mentioned, synthetic user experience monitoring tools can watch your websites from multiple locations at frequencies of your choice. Seeing this data over time can help you identify areas to optimize going forward. Waterfall charts can be particularly helpful to pinpoint performance bottlenecks over time.

Monitor the Performance of Critical SaaS Applications From Inside Your Firewall

Most companies rely on third-party SaaS applications to run some aspects of their business. For instance, your sales team may be using a SaaS CRM solution to drive and track their daily activities. It’s critical to know if your coworkers are having issues getting what they need. While you don’t own the app, you’re the one they’ll come to when they have issues. A common scenario is setting up a transaction to make sure a valid user can log in successfully and be alerted if it fails.

Knowing about failures or performance issues before your users can save you time and frustration. Synthetic user experience monitoring can help when it comes to websites and web-based applications. How have you used it? Comment below and let us know.

Read more
5 12 2,135
Level 10

Slow websites on your mobile device are frustrating when you’re trying to look up something quickly. When a page takes forever to load, it’s often due to a spotty network connection or a website that is overly complicated for a phone. Websites that load many images or videos can also eat up your data plan. Most people have a monthly cap on the amount of data they can use, and it can be expensive to pay an overage fee or upgrade your plan.

Can switching to a different browser app truly help websites load faster and use less data? We’ll put the most popular mobile browsers to the test to see which is the fastest and uses the least data. Most people use their phone’s default browser app, like Safari on iPhone or Chrome on Android. Other browsers, like Firefox Focus and Puffin, claim to better at saving data. Let’s see which one comes out on top with our testing.

How We Benchmark

We’ll look specifically at page-load performance by testing three popular websites with different styles of content. The first will be the Google home page, which should load quickly as Google designed it to be fast. Next, we’ll measure the popular social media website Reddit. Lastly, we’ll test BuzzFeed, a complex website with many ads and trackers.

To conduct these tests, we’ll use an Apple iPhone 7. (We may look at other phones such as Android in future articles.) We’ll use the browsers with default settings and clear any private browsing data so cached data won’t change the results.

Since we don’t have access to the browser developer tools we’d typically have on a desktop, we’ll need to use a different technique. One way is to time how long it takes to download the page, but some websites preload data in the background to make your next click load faster. From the user’s perspective, this shouldn’t count toward the page-load time because it happens behind the scenes. A better way is to record a video of each page loading. We can then play them back and see how long each took to load all the visible content.

To see how much data each browser used, we’ll use something called a “proxy server” to monitor the phone’s connections. Normally, phones load data directly through the cellular carrier’s LTE connection or through a router’s Wi-Fi connection. A proxy server acts like a man in the middle, letting us count how much data passes between the website and the phone. It also lets us see which websites it loaded data from and even the contents of the data.

We’ll use the proxy server software called Fiddler. This tool also allows enables us to decrypt the HTTPS connection to the website and spy on exactly which data is being sent. We configured it for iOS by installing a root certificate on our phone, which the computer can use to decrypt the data. Fiddler terminates the SSL connection with the external website, then encrypts the data to our phone using its own root certificate. It allows us to see statistics on which sites were visited, which assets where loaded, and more.


©2015 Telerik

The Puffin browser made things more challenging because we were unable to see the contents of pages after installing the Fiddler root certificate. It’s possible Puffin uses a technique called certificate pinning. Nevertheless, we were still able to see the number of bytes being sent over the connection to our phone and which servers it connected to.

Which Browser has the Best Mobile Performance?

Here are the results of measuring the page-load time for each of the mobile browsers against our three chosen websites. Faster page load times are better.

BrowserGoogle.comReddit.comBuzzfeed.com
Safari3.48s5.50s8.67s
Chrome1.03s4.93s5.93s
Firefox1.89s3.47s3.50s
Firefox focus2.67s4.90s5.70s
Puffin0.93s2.20s2.40s

The clear winner in the performance category is Puffin, which loaded pages about twice as fast as most other browsers. Surprisingly, it even loaded Google faster than Chrome. Puffin claims the speed is due to a proprietary compression technology. Most modern browsers support gzip compression, but it’s up to site operators to enable it. Puffin can compress all content by passing it through its own servers first. It can also downsize images and videos so they load faster on mobile.

Another reason Puffin was so much faster is because it connected to fewer hosts. Puffin made requests to only 14 hosts, whereas Safari made requests to about 50 hosts. Most of those extra hosts are third-party advertisement and tracking services. Puffin was able to identify them and either remove them from the page or route calls through its own, faster servers at cloudmosa.net.

PuffinSafari
vid.buzzfeed.com: 83img.buzzfeed.com: 51
google.com: 9www.google-analytics.com: 16
www.google.com: 2www.buzzfeed.com: 14
en.wikipedia.org: 2tpc.googlesyndication.com: 9
pointer2.cloudmosa.net: 2securepubads.g.doubleclick.net: 7
data.flurry.com: 2pixiedust.buzzfeed.com: 7
www.buzzfeed.com: 2vid.buzzfeed.com: 6
pivot-ha2.cloudmosa.net: 1cdn-gl.imrworldwide.com: 6
p40-buy.itunes.apple.com: 1www.facebook.com: 6
gd11.cloudmosa.net: 1sb.scorecardresearch.com: 3
gd10.cloudmosa.net: 1cdn.smoot.apple.com: 3
gd9.cloudmosa.net: 1pagead2.googlesyndication.com: 3
collector.cloudmosa.net: 1video-player.buzzfeed.com: 3
www.flashbrowser.com: 1gce-sc.bidswitch.net: 3
secure-dcr.imrworldwide.com: 3
connect.facebook.net: 3
events.redditmedia.com: 3
s3.amazonaws.com: 2
thumbs.gfycat.com: 2
staticxx.facebook.com: 2
id.rlcdn.com: 2
i.redditmedia.com: 2
googleads.g.doubleclick.net: 2
videoapp-assets-ak.buzzfeed.com: 2
c.amazon-adsystem.com: 2
buzzfeed-d.openx.net: 2
pixel.quantserve.com: 2
… 20 more omitted

It’s great Puffin was able to load data so quickly, but it raises some privacy questions. Any users of this browser are giving CloudMosa access to their entire browsing history. While Firefox and Chrome let you opt out of sending usage data, Puffin does not. In fact, it’s not possible to turn this tracking off without sacrificing the speed improvements. The browser is supported by ads, although its privacy policy claims it doesn’t keep personal data. Each user will have to decide if he or she is comfortable with this arrangement.

Which Browser Uses the Least Mobile Data?

Now let’s look at the amount of data each browser uses. Again, we see surprising results:

BrowserGoogle.comReddit.comBuzzfeed.com
Safari0.82MB2.89MB4.22MB
Chrome0.81MB2.91MB5.46MB
Firefox0.82MB2.62MB3.15MB
Firefox Focus0.79MB2.61MB3.13MB
Puffin0.54MB0.17MB42.2MB

Puffin was the clear leader for loading google.com and it dominated reddit.com by a factor of 10. It claims it saved 97% of data usage on reddit.com.


©2015 Telerik

However, Puffin lost on buzzfeed.com by a factor of 10. In Fiddler, we saw that it made 83 requests to vid.buzzfeed.com. It appears it was caching video data in the background so videos would play faster. While doing so saves the user time, it ends up using way more data. On a cellular plan, this approach could quickly eat up a monthly cap.

As a result, Firefox Focus came in the lead for data usage on buzzfeed.com. Since Firefox Focus is configured to block trackers by default, it was able to load the page using the least amount of mobile data. It was also able to avoid making requests to most of the trackers listed in the Buzzfeed section above. In fact, if we take away Puffin, Firefox Focus came in the lead consistently for all the pages. If privacy is important, Firefox Focus could be a great choice for you.

How to Test Your Website Performance

Looking at the three websites we tested, we see an enormous difference in the page-load time in the amount of data used. It matters because higher page-load time is correlated to higher bounce rates and even lower online purchases.

Pingdom® makes it even easier to test your own website’s performance with page speed monitoring. It gives you a report card showing how your website compares with others in terms of load time and page size.

To get a better idea of your customer’s experience, you can see a film strip showing how content on the page loads over time. Below, we can see that Reddit takes about two seconds until it’s readable. If we scroll over, we’d see it takes about six seconds to load all the images.

The SolarWinds® Pingdom® solution also allows us to dive deeper into a timeline view showing exactly which assets were loaded and when. The timeline view helps us see if page assets are loading slowly because of network issues or their size, or because third parties are responding slowly. The view will give us enough detail to go back to the engineering team with quantifiable data.

Pingdom offers a free version that gives you a full speed report and tons of actionable insights. The paid version also gives you the filmstrip, tracks changes over time, and offers many more website monitoring tools.

Conclusion

The mobile browser you choose can make a big difference in terms of page-load time and data usage. We saw that the Puffin browser was able to load pages much faster than the default Safari browser on an Apple iPhone 7. Puffin also used less data to load some, but not all, pages. However, for those who care about privacy and saving data on their mobile plan, Firefox Focus may be your best bet.

Because mobile performance is so important for customers, you can help improve your own website using the Pingdom page speed monitoring solution. This tool will give you a report card to share with your team and specific actions you can take to make your site faster.

Read more
0 3 1,334
Level 9

It’s easy to recognize problems in Ruby on Rails, but finding each problem’s source can be a challenging task. A problem due to an unexpected event could result in hours of searching through log files and attempting to reproduce the issue. Poor logs will leave you searching, while a helpful log can assist you in finding the cause right away.

Ruby on Rails applications automatically create and maintain the basic text logs for each environment, such as development, staging, and production. You can easily format and add extra information to the logs using open-source logging libraries, such as Lograge and Fluentd. These libraries effectively manage small applications, but as you scale your application across many servers, developers need to aggregate logs to troubleshoot problems across all of them.

In this tutorial, we will show you how Ruby on Rails applications handle logging natively. Then, we’ll show you how to send the logs to SolarWinds® Papertrail™. This log management solution enables you to centralize your logs in the cloud and provides helpful features like fast search, alerts, and more.

Ruby on Rails Native Logging

Ruby offers a built-in logging system. To use it, simply include the following code snippet in your environment.rb (development.rb/production.rb). You can find environments under the config directory of the root project.

config.logger = Logger.new(STDOUT)

Or you can include the following in an initializer:

Rails.logger = Logger.new(STDOUT)

By default, each log is created under #{Rails.root}/log/ and the log file is
named after the environment in which the application is running. The default format gives basic information that includes the date/time of log generation and description (message or exception) of the log.

D, [2018-08-31T14:12:44.116332 #28944] DEBUG -- : Debug message I, [2018-08-31T14:12:44.117330 #28944]  INFO -- : Test message F, [2018-08-31T14:12:44.118348 #28944] FATAL -- : Terminating application, raised unrecoverable error!!! F, [2018-08-31T14:12:44.122350 #28944] FATAL -- : Exception (something bad happened!):

Each log line also includes the severity, otherwise known as log level. The log levels enable you to filter the logs when outputting them or when monitoring problems, such as errors or fatal. The available log levels are :debug, :info, :warn, :error, :fatal, and :unknown. These are
converted to uppercase when output in the log file.

Formatting Logs Using Lograge

The default logging in Ruby on Rails during development or in production can be noisy, as you can see below. It also records a limited amount of
information for each page view.

I, [2018-08-31T14:37:44.588288 #27948]  INFO -- : method=GET path=/ format=html controller=Rails::WelcomeController action=index status=200 duration=105.06 view=51.52 db=0.00 params={'controller'=>'rails/welcome', 'action'=>'index'} headers=#<ActionDispatch::Http::Headers:0x046ab950> view_runtime=51.52 db_runtime=0

Lograge adds extra detail and uses a format that is less human readable, but more useful for large-scale analysis through its JSON output option. JSON makes it easier to search, filter, and summarize large volumes of logs. The discrete fields facilitate the process of searching through logs and filtering for the information you need.

I, [2018-08-31T14:51:54.603784 #17752]  INFO -- : {'method':'GET','path':'/','format':'html','controller':'Rails::WelcomeController','action':'index','status':200,'duration':104.06,'view':51.99,'db':0.0,'params':{'controller':'rails/welcome','action':'index'},'headers':'#<ActionDispatch::Http::Headers:0x03b75520>','view_runtime':51.98726899106987,'db_runtime':0}

In order to configure Lograge in a Ruby on Rails app, you need to follow some simple steps:

Step 1: Find the Gemfile under the project root directory and add the following gem.

gem 'lograge'

Step 2: Enable Lograge in each relevant environment (development, production, staging) or in the initializer. You can find all those environments under the config directory of your project. To find the initializer, open up the config directory of your project.

# config/initializers/lograge.rb# OR# config/environments/production.rbRails.application.configure do  config.lograge.enabled = trueend

Step 3: If you’re using Rails 5’s API-only mode and inherit from ActionController::API, you must define it as the controller base class that Lograge will patch:

# config/initializers/lograge.rbRails.application.configure do  config.lograge.base_controller_class = 'ActionController::API'end

With Lograge, you can include additional attributes in log messages, like user ID or request ID, host, source IP, etc. You can read the Lograge documentation to get more information.


Here’s a simple example that captures three attributes:

class ApplicationController < ActionController::Base  before_action :append_info_to_payload  def append_info_to_payload(payload)    super    payload[:user_id] = current_user.try(:id)    payload[:host] = request.host    payload[:source_ip] = request.remote_ip  endend

The above three attributes are logged in environment.rb (production.rb/development.rb) with this block.

config.lograge.custom_options = lambda do |event|  event.payloadend

Troubleshoot Problems Faster Using Papertrail

Papertrail is a popular cloud-hosted log management service that integrates with different logging library solutions. It is easier to centralize all your Ruby on Rails log management in the cloud. You can quickly track real-time activity, making it easier to identify and troubleshoot real-time production applications.

Papertrail provides numerous features for handling Ruby on Rails log files, including:

Instant log visibility: Papertrail provides fast search and team-wide
access. It also provides analytics reporting and webhook monitoring, which
can be set up typically in less than a minute.

Aggregate logs: : Papertrail aggregates logs across your entire deployment, making them available from a single location. It provides you with an easy way to access logs, including application logs, database logs, Apache logs, and more.

2018-10-03-viewer

Tail and search logs: Papertrail lets you tail logs in real time from
multiple devices. With the help of advanced searching and filtering tools, you can quickly troubleshoot issues in a production environment.

Proactive alert notifications: Almost every application has critical events
that require human attention. That’s precisely why alerts exist. Papertrail gives you the ability to receive alerts via email, Slack, Librato®, PagerDuty, or any custom HTTP webhooks of your choice.

2018-10-03-edit-alert

Log archives: You can load the Papertrail log archives into third-party utilities, such as Redshift or Hadoop.

Logs scalability: With Papertrail, you can scale your log volume and desired searchable duration.

Encryption: For your security, Papertrail supports optional TLS encryption
and certificate-based destination host verification.

Configuring Ruby on Rails to Send Logs to Papertrail

It’s an easy task to get started with Papertrail. If you already have log files,
you can send them to Papertrail using Nxlog or remote_syslog2. This utility will monitor the log files and send new logs to Papertrail. Next, we’ll show you how to send events asynchronously from Ruby on Rails using the remote_syslog_logger.

Add the remote_syslog_logger to your Gemfile. If you are not using a Gemfile, run the following script:

$ gem install remote_syslog_logger

Change the environment configuration file to log via remote_syslog_logger. This is almost always in config/environment.rb (to affect all environments) or config/environments/<environment name>.rb, such as config/environments/production.rb (to affect only a specific environment). Update the host and port to the ones given to you in your Papertrail log destination settings.

config.logger = RemoteSyslogLogger.new('logsN.papertrailapp.com', XXXXX)

It’s that simple! Your logs should now be sent to Papertrail.

Papertrail is designed to help you troubleshoot customer problems, resolve error messages, improve slow database queries, and more. It gives you analytical tools to help identify and resolve system anomalies and potential security issues. Learn more about how Papertrail can give you frustration-free log management in the cloud, and sign up for a trial or the free plan to get started.

Read more
0 3 817
Product Manager
Product Manager

We’re no strangers to logging from Docker containers here at SolarWinds® Loggly®. In the past, we’ve demonstrated different techniques for logging individual Docker containers. But while logging a handful of containers is easy, what happens when you start deploying dozens, hundreds, or thousands of containers across different machines?

In this post, we’ll explore the best practices for logging applications deployed using Docker Swarm.

Intro to Docker Swarm

Docker Swarm is a container orchestration and clustering tool from the creators of Docker. It allows you to deploy container-based applications across a number of computers running Docker. Swarm uses the same command-line interface (CLI) as Docker, making it more accessible to users already familiar with Docker. And as the second most popular orchestration tool behind Kubernetes, Swarm has a rich ecosystem of third-party tools and integrations.

A swarm consists of manager nodes and worker nodes. Managers control how containers are deployed, and workers run the containers. In Swarm, you don’t interact directly with containers, but instead define services that define what the final deployment will look like. Swarm handles deploying, connecting, and maintaining these containers until they meet the service definition.

For example, imagine you want to deploy an Nginx web server. Normally, you would start an Nginx container on port 80 like so:

$ docker run --name nginx --detach --publish 80:80 nginx

With Swarm, you instead create a service that defines what image to use, how many replica containers to create, and how those containers should interact with both the host and each other. For example, let’s deploy an Nginx image with three containers (for load balancing) and expose it over port 80.

$ docker service create --name nginx --detach --publish 80:80 --replicas 3 nginx

When the deployment is done, you can access Nginx using the IP address of any node in the Swarm.

Best Practices for Logging in Docker Swarm 1
© 2011-2018 Nginx, Inc.

To learn more about Docker services, see the services documentation.

The Challenges of Monitoring and Debugging Docker Swarm

Besides the existing challenges in container logging, Swarm adds another layer of complexity: an orchestration layer. Orchestration simplifies deployments by taking care of implementation details such as where and how containers are created. But if you need to troubleshoot an issue with your application, how do you know where to look? Without comprehensive logs, pinpointing the exact container or service where an error occurred can become an operational nightmare.

On the container side, nothing much changes from a standard Docker environment. Your containers still send logs to stdout and stderr, which the host Docker daemon accesses using its logging driver. But now your container logs include additional information, such as the service that the container belongs to, a unique container ID, and other attributes auto-generated by Swarm.

Consider the Nginx example. Imagine one of the containers stops due to a configuration issue. Without a monitoring or logging solution in place, the only way to know this happened is by connecting to a manager node using the Docker CLI and querying the status of the service. And while Swarm automatically groups log messages by service using the docker service logs command, searching for a specific container’s messages can be time-consuming because it only works when logged in to that specific host.

How Docker Swarm Handles Logs

Like a normal Docker deployment, Swarm has two primary log destinations: the daemon log (events generated by the Docker service), and container logs (events generated by containers). Swarm doesn’t maintain separate logs, but appends its own data to existing logs (such as service names and replica numbers).

The difference is in how you access logs. Instead of showing logs on a per-container basis using docker logs <container name>, Swarm shows logs on a per-service basis using docker service logs <service name>. This aggregates and presents log data from all of the containers running in a single service. Swarm differentiates containers by adding an auto-generated container ID and instance ID to each entry.

For example, the following message was generated by the second container of the nginx_nginx service, running on swarm-client1.

# docker service logs nginx_nginx  nginx_nginx.2.subwnbm15l3f@swarm-client1 | 10.255.0.2 - - [01/Jun/2018:22:21:11 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0" "-"

To learn more about the logs command, see the Docker documentation.

Options for Logging in Swarm

Since Swarm uses Docker’s existing logging infrastructure, most of the standard Docker logging techniques still apply. However, to centralize your logs, each node in the swarm will need to be configured to forward both daemon and container logs to the destination. You can use a variety of methods such as Logspout, the daemon logging driver, or a dedicated logger attached to each container.

Best Practices to Improve Logging

To log your swarm services effectively, there are a few steps you should take.

1. Log to STDOUT and STDERR in Your Apps

Docker automatically forwards all standard output from containers to the built-in logging driver. To take advantage of this, applications running in your Docker containers should write all log events to STDOUT and STDERR. If you try to log from within your application, you risk losing crucial data about your deployment.

2. Log to Syslog Or JSON

Syslog and JSON are two of the most commonly supported logging formats, and Docker is no exception. Docker stores container logs as JSON files by default, but it includes a built-in driver for logging to Syslog endpoints. Both JSON and Syslog messages are easy to parse, contain critical information about each container, and are supported by most logging services. Many container-based loggers such as Logspout support both JSON and Syslog, and Loggly has complete support for parsing and indexing both formats.

3. Log to a Centralized Location

A major challenge in cluster logging is tracking down log files. Services could be running on any one of several different nodes, and having to manually access log files on each node can become unsustainable over time. Centralizing logs lets you access and manage your logs from a single location, reducing the amount of time and effort needed to troubleshoot problems.

One common solution for container logs is dedicated logging containers. As the name implies, dedicated logging containers are created specifically to gather and forward log messages to a destination such as a syslog server. Dedicated containers automatically collect messages from other containers running on the node, making setup as simple as running the container.

Why Loggly Works for Docker Swarm

Normally you would access your logs by connecting to a master node, running docker service logs <service name>, and scrolling down to find the logs you’re looking for. Not only is this labor-intensive, but it’s slow because you can’t easily search, and it’s difficult to automate with alerts or create graphs. The more time you spend searching for logs, the longer problems go unresolved. This also means creating and maintaining your own log centralization infrastructure, which can become a significant project on its own.

Loggly is a log aggregation, centralization, and parsing service. It provides a central location for you to send and store logs from the nodes and containers in your swarm. Loggly automatically parses and indexes messages so you can search, filter, and chart logs in real-time. Regardless of how big your swarm is, your logs will be handled by Loggly.

Sending Swarm Logs to Loggly

The easiest way to send your container logs to Loggly is with Logspout. Logspout is a container that automatically routes all log output from other containers running on the same node. When deploying the container in global mode, Swarm automatically creates a Logspout container on each node in the swarm.

To route your logs to Loggly, provide your Loggly Customer Token and a custom tag, then specify a Loggly endpoint as the logging destination.

# docker service create --name logspout --mode global --detach --volume=/var/run/docker.sock:/var/run/docker.sock --volume=/etc/hostname:/etc/host_hostname:ro -e SYSLOG_STRUCTURED_DATA="<Loggly Customer Token>@41058 tag=\"<custom tag>\"" gliderlabs/logspout syslog+tcp://logs-01.loggly.com:514

You can also define a Logspout service using Compose.

#

 docker-compose-logspout.yml  version: "3"  networks:   logging:  services:   logspout:     image: gliderlabs/logspout     networks:       - logging     volumes:       - /etc/hostname:/etc/host_hostname:ro       - /var/run/docker.sock:/var/run/docker.sock     environment:       SYSLOG_STRUCTURED_DATA: "<Loggly Customer Token>@41058"       tag: "<custom tag>"     command: syslog+tcp://logs-01.loggly.com:514     deploy:       mode: global

Use docker stack deploy to deploy the Compose file to your swarm. <stack name> is the name that you want to give to the deployment.

# docker stack deploy --compose-file docker-compose-logspout.yml <stack name>

As soon as the deployment is complete, messages generated by your containers start appearing in Loggly.
Best Practices for Logging in Docker Swarm 2

Configuring Dashboards and Alerts

Since Swarm automatically appends information about the host, service, and replica to each log message, we can create Dashboards and Alerts similar to those for a single-node Docker deployment. For example, Loggly automatically breaks down logs from the Nginx service into individual fields.

Best Practices for Logging in Docker Swarm 3

We can create Dashboards that show, for example, the number of errors generated on each node, as well as the container activity level on each node.

Best Practices for Logging in Docker Swarm 4

Alerts are useful for detecting changes in the status of a service. If you want to detect a sudden increase in errors, you can easily create a search that scans messages from a specific service for error-level logs.

Best Practices for Logging in Docker Swarm 5

You can select this search from the Alerts screen and specify a threshold. For example, this alert triggers if the Nginx service logs more than 10 errors over a 5-minute period.

Best Practices for Logging in Docker Swarm 6

Conclusion

While Swarm can add a layer of complexity over a typical Docker installation, logging it doesn’t have to be difficult. Tools like Logspout and Docker logging drivers have made it easier to collect and manage container logs no matter where those containers are running. And with Loggly, you can easily deploy a complete, cluster-wide logging solution across your entire environment.

Read more
2 1 725
Level 10

Have you ever wondered what happens when you type an address into your browser? The first step is the translation of a domain name (such as pingdom.com) to an IP address. Resolving domain names is done through a series of systems and protocols that make up the Domain Name System (DNS). Here we’ll break down what DNS is, and how it powers the underlying infrastructure of the internet.

What is DNS?

Traffic across the internet is routed by an identifier called an IP address. You may have seen IP addresses before. IPv4 addresses are a series of four numbers under 256, separated by periods (for example: 123.45.67.89).

IP addresses are at the core of communicating between devices on the internet, but they are hard to memorize and can change often, even for the same service. To get around these problems, we give names to IP addresses. For example, when you type https://www.pingdom.com into your web browser, it translates that name into an IP address, which your computer then uses to access a server that ultimately responds with the contents of the page that your browser displays. If a new server is put into place with a new IP address, that name can simply be updated to point to the new address.

These records are stored in the name server for a given name, or “zone,” in DNS parlance. These zones can include many different records and record types for the base name and subdomains in that zone.

The internet is decentralized, designed to withstand failure, and not rely on a single source of truth. DNS is built for this environment using recursion, which enables DNS servers to talk to each other to find the answer for a request. Each server is more authoritative than the last, until it reaches one of 13 “root” servers that are globally maintained as the definitive source for other DNS servers.

Anatomy of a DNS Request

When you type in “pingdom.com” to your browser and hit enter, your browser doesn’t directly ask the web servers for that page. First, a multi-step interaction with DNS servers must happen to translate pingdom.com into an IP address that is useable for establishing a connection and routing traffic. Here’s what that interaction looks like:

  1. Recursive DNS server requests abc.com from a DNS root server. The root server replies with the .com TLD name server IP address.
  2. Recursive DNS server requests abc.com from the .com TLD name server. The TLD name server replies with the authoritative name server for abc.com.
  3. Recursive DNS server requests abc.com from the abc.com nameserver. The nameserver replies with the IP address A record for abc.com. This IP address is returned to the client.
  4. Client requests abc.com using the web server’s IP address that was just resolved.

In subsequent requests, the recursive name server will have the IP address for pingdom.com.

This IP address is cached for a period of time determined by the pingdom.com nameserver. This value is called the time-to-live (TTL) for that domain record. A high TTL for a domain record means that local DNS resolvers will cache responses for longer and give quicker responses. However, making changes to DNS records can take longer due to the need to wait for all cached records to expire. Conversely, domain records with low TTLs can change much more quickly, but DNS resolvers will need to refresh their records more often.

Not Just for the Web

The DNS protocol is for anything that requires a decentralized name, not just the web. To differentiate between various types of servers registered with a nameserver, we use record types. For example, email servers are part of DNS. If a domain name has an MX record, it is signaling that the address associated with that record is an email server.

Some of the more common record types you will see are:

  • A Record – used to point names directly at IPv4 addresses. This is used by web browsers.
  • AAAA Record – used to point names directly at IPV6 addresses. This is used by web browsers when a device has an IPv6 network.
  • CNAME Record – also known as the Canonical Name record and is used to point web domains at other DNS names. This is common when using platforms as a service such as Heroku or cloud load balancers that provide an external domain name rather than an IP address.
  • MX Record – as mentioned before, MX records are used to point a domain to mail servers.
  • TXT Record – arbitrary information attached to a domain name. This can be used to attach validation or other information about a domain name as part of the DNS system. Each domain or subdomain can have one record per type, with the exception of TXT records.

DNS Security and Privacy

There are many parts to resolving a DNS request, and these parts are subject to security and privacy issues. First, how do we verify that the IP address we requested is actually the one on file with the domain’s root nameserver? Attacks exist that can disrupt the DNS chain, providing false information back to the client or triggering denial of service attacks upon sites. Untrusted network environments are vulnerable to man-in-the-middle attacks that can hijack DNS requests and provide back false results.

There is ongoing work to enhance the security of DNS with the Domain Name System Security Extensions (DNSSEC). This is a combination of new records, public-key cryptography, and establishing a chain of trust with DNS providers to ensure domain records have not been tampered with. Some DNS providers today offer the ability to enable DNSSEC, and its adoption is growing as DNS-based attacks become more prevalent.

DNS requests are also typically unencrypted, which allows attackers and observers to pry into the contents of a DNS request. This information is valuable, and your ISP or recursive zone provider may be providing this information to third parties or using it to track your activity. Furthermore, it may or may not contain personally identifiable information like your IP address, which can be correlated with other tracking information that third parties may be holding.

There are a few ways to help protect your privacy with DNS and prevent this sort of tracking:

1. Use a Trusted Recursive Resolver

Using a trusted recursive resolver is the first step to ensuring the privacy of your DNS requests. For example, the Cloudflare DNS service https://1.1.1.1is a fast, privacy-centric DNS resolver. Cloudflare doesn’t log IP addresses or track requests that you make against it at any time.

2. Use DNS over HTTPS (DoH)

DoH is another way of enhancing your privacy and security when interacting with DNS resolvers. Even when using a trusted recursive resolver, man-in-the-middle attacks can alter the returned contents back to the requesting client. DNSSEC offers a way to fix this, but adoption is still early, and relies on DNS providers to enable this feature.

DoH secures this at the client to DNS resolver level, enabling secure communication between the client and the resolver. The Cloudflare DNS service offers DNS over HTTPS, further enhancing the security model that their recursive resolver provides. Keep in mind that the domain you’re browsing is still available to ISPs thanks to Server Name Indication, but the actual contents, path, and other parts of the request are encrypted.

Even without DNSSEC, you can still have a more private internet experience. Firefox recently switched over to using the Cloudflare DNS resolver for all requests by default. At this time, DoH isn’t enabled by default unless you are using the nightly build.

Monitoring DNS Problems

DNS is an important part of your site’s availability because a problem can cause a complete outage. DNS has been known to cause outages due to BGP attacks, TLD outages, and other unexpected issues. It’s important your uptime or health check script includes DNS lookups.

Using SolarWinds® Pingdom®, we can monitor for DNS problems using the uptime monitoring tool. Here we will change the DNS record for a domain and show you how the Pingdom tool responds. Once you have an uptime check added in Pingdom, click the “Reports” section, and “Uptime” under that section, then go to your domain of interest. Under the “Test Result Log” tab for an individual domain’s uptime report, hover over the failing entry to see why a check failed.

This tells us that for our domain, we have a “Non-recoverable failure in name resolution.” This lets us know to check our DNS records. After we fix the problem, our next check succeeds:

Pingdom gives us a second set of eyes to make sure our site is still up as expected.

Curious to learn more about DNS? Check out our post on how to test your DNS-configuration. You can also learn more about Pingdom uptime monitoring.

Read more
9 10 1,261
Level 9

When you’re troubleshooting a problem or tracking down a bug in Python, the first place to look for clues related to server issues is in the application log files.

Python includes a robust logging module in the standard library, which provides a flexible framework for emitting log messages. This module is widely used by various Python libraries and is an important reference point for most programmers when it comes to logging.

The Python logging module provides a way for applications to configure different log handlers and provides a standard way to route log messages to these handlers. As the Python.org documentation notes, there are four basic classes defined by the Python logging module: Loggers, Handlers, Filters, and Formatters. We’ll provide more details on these below.

Getting Started with Python Logs

There are a number of important steps to take when setting up your logs. First, you need to ensure logging is enabled in the applications you use. You also need to categorize your logs by name so they are easy to maintain and search. Naming the logs makes it easier to search through large log files, and to use filters to find the information you need.

To send log messages in Python, request a logger object. It should have a unique name to help filter and prioritize how your Python application handles various messages. We are also adding a StreamHandler to print the log on our console output. Here’s a simple example:

import logginglogging.basicConfig(handlers=[logging.StreamHandler()])log = logging.getLogger('test')log.error('Hello, world')

This outputs:

ERROR:test:Hello, world

This message consists of three fields. The first, ERROR, is the log level. The second, test, is the logger name. The third field, “Hello, world”, is the free-form log message.

Most problems in production are caused by unexpected or unhandled issues. In Python, such problems generate tracebacks where the interpreter tries to include all important information it could gather. This can sometimes make the traceback a bit hard to read, though. Let’s look at an example traceback. We’ll call a function that isn’t defined and examine the error message.

def test():    nofunction()test()

Which outputs:

Traceback (most recent call last):   File '<stdin>', line 1, in <module>   File '<stdin>', line 2, in test NameError: global name 'nofunction' is not defined

This shows the common parts of a Python traceback. The error message is usually at the end of the traceback. It says “nofunction is not defined,” which is what we expected. The traceback also includes the lines of all stack frames that were touched when this error occurred. Here we can see that it occurred in the test function on line two. Stdin means standard input and refers to the console where we typed this function. If we were using a Python source file, we’d see the file name here instead.

Configuring Logging

You should configure the logging module to direct messages to go where you want them. For most applications, you will want to add a Formatter and a Handler to the root logger. Formatters let you specify the fields and timestamps in your logs. Handlers let you define where they are sent. To set these up, Python provides a nifty factory function called basicConfig.

import logginglogging.basicConfig(format='%(asctime)s %(message)s',                  handlers=[logging.StreamHandler()])logging.debug('Hello World!')

By default, Python will output uncaught exceptions to your system’s standard error stream. Alternatively, you could add a handler to the excepthook to send any exceptions through the logging module. This gives you the flexibility to provide custom formatters and handlers. For example, here we log our exceptions to a log file using the FileHandler:

import loggingimport syslogger = logging.getLogger('test')fileHandler = logging.FileHandler('errors.log')logger.addHandler(fileHandler)def my_handler(type, value, tb😞  logger.exception('Uncaught exception: {0}'.format(str(value)))# Install exception handlersys.excepthook = my_handler# Throw an errornofunction()

Which results in the following log output:

$ cat errors.log Uncaught exception: name 'nofunction' is not defined None

In addition, you can filter logs by configuring the log level. One way to set the log level is through an environment variable, which gives you the ability to customize the log level in the development or production environment. Here’s how you can use the LOGLEVEL environment variable:

$ export LOGLEVEL='ERROR' $ python >>> import logging >>> logging.basicConfig(handlers=[logging.StreamHandler()]) >>> logging.debug('Hello World!') #prints nothing >>> logging.error('Hello World!') ERROR:root:Hello World!

Logging from Modules

Modules intended for use by other programs should only emit log messages. These modules should not configure how log messages are handled. A standard logging best practice is to let the Python application importing and using the modules handle the configuration.

Another standard best practice to follow is that each module should use a logger named like the module itself. This naming convention makes it easy for the application to distinctly route various modules and helps keep the log code in the module simple.

You need just two lines of code to set up logging using the named logger. Once you do this in Python, the “ name” contains the full name of the current module, and will work in any module. Here’s an example:

import logginglog = logging.getLogger(__name__)def do_something():    log.debug('Doing something!')

Analyzing Your Logs with Papertrail

Python applications on a production server contain millions of lines of log entries. Command line tools like tail and grep are often useful during the development process. However, they may not scale well when analyzing millions of log events spread across multiple servers.

Centralized logging can make it easier and faster for developers to manage a large volume of logs. By consolidating log files onto one integrated platform, you can eliminate the need to search for related data that is split across multiple apps, directories, and servers. Also, a log management tool can alert you to critical issues, helping you more quickly identify the root cause of unexpected errors, as well as bugs that may have been missed earlier in the development cycle.

For production-scale logging, a log management tool such as Solarwinds® Papertrail™ can help you better manage your data. Papertrail is a cloud-based platform designed to handle logs from any Python application, including Django servers.

The Papertrail solution provides a central repository for event logs. It helps you consolidate all of your Python logs using syslog, along with other application and database logs, giving you easy access all in one location. It offers a convenient search interface to find relevant logs. It can also stream logs to your browser in real time, offering a “live tail” experience. Check out the tour of the Papertrail solution’s features.

2018-09-07-viewer

Papertrail is designed to help minimize downtime. You can receive alerts via email, or send them to Slack, Librato, PagerDuty, or any custom HTTP webhooks of your choice. Alerts are also accessible from a web page that enables customized filtering. For example, you can filter by name or tag.

2018-09-07-edit-alert

Configuring Papertrail in Your Application

There are many ways to send logs to Papertrail depending on the needs of your application. You can send logs through journald, log files, Django, Heroku, and more. We will review the syslog handler below.

Python can send log messages directly to Papertrail with the Python SysLogHandler. Just set the endpoint to the log destination shown in your Papertrail settings. You can optionally format the timestamp or set the log level as shown below.

import loggingimport socketfrom logging.handlers import SysLogHandlersyslog = SysLogHandler(address=('logsN.papertrailapp.com', XXXXX))format = '%(asctime)s YOUR_APP: %(message)s'formatter = logging.Formatter(format, datefmt='%b %d %H:%M:%S')syslog.setFormatter(formatter)logger = logging.getLogger()logger.addHandler(syslog)logger.setLevel(logging.INFO)def my_handler(type, value, tb😞  logger.exception('Uncaught exception: {0}'.format(str(value)))# Install exception handlersys.excepthook = my_handlerlogger.info('This is a message')nofunction() #log an uncaught exception

Conclusion

Python offers a well-thought-out framework for logging that makes it simple to enable and manage your log files. Getting started is easy, and a number of tools baked into Python automate the logging process and help ensure ease of use.

Papertrail adds even more functionality and tools for diagnostics and analysis, enabling you to manage your Python logs on a centralized cloud server. Quick to setup and easy to use, Papertrail consolidates your logs on a safe and accessible platform. It simplifies your ability to search log files, analyze them, and then act on them in real time—so that you can focus on debugging and optimizing your applications.

Learn more about how Papertrail can help optimize your development ecosystem.

Read more
0 0 695
Product Manager
Product Manager

The world may run on coffee, but it’s the alarm clock that gets us out of bed. It operates on a simple threshold. You set the time that’s important to you and receive an alert when that variable is true.

Like your alarm clock, today’s tooling for web service alerting often operates on simple thresholds, but unlike with your clock, there is a wide variety of metrics and it’s not as clear which should trigger an alert. Until we have something better than thresholds, engineers have to carefully weigh which metrics are actionable, how they are being measured, and what thresholds correspond to real-world problems.

Measure the Thing You Care About

In practice, this arguably simple process of reasoning about what you are monitoring, and how you are monitoring it, is rarely undertaken. More often, our metric choices and threshold values are guided by our preexisting tools. Hence, if our tools cannot measure latency, we do not alert on latency. This practice of letting our tools guide our telemetry content is an anti-pattern which results in unreliable problem detection and alerting.

Effective alerting requires metrics that are reliably actionable. You have to start by reasoning about the application and/or infrastructure you want to monitor. Only then can you choose and implement collection tools that get you the metrics you’re actually interested in, like queue size, DB roundtrip times, and inter-service latency.

One Reliable Signal

Effective alerting requires a singular, reliable telemetry signal, to which every collector can contribute. Developing and ensuring a reliable signal can be difficult, but the orders of magnitude are simpler than building out multiple disparate monitoring systems and trying to make them agree with each other – in the way many shops, for example, alert from one system like Nagios, and troubleshoot from another like Ganglia.

It’s arguably impossible to make multiple, fallible systems agree with each other in every case. They may usually agree, but every false positive or false negative undermines the credibility of both systems. Further, multiple systems rarely improve because it’s usually impossible to know which system was at fault when they disagree. Did the alerting system send a bogus alert or is there a problem with the data in the visualization system? If false positives arise from a single telemetry system, you simply iterate and improve that system.

Alert Recipient == Alert Creator

Crafting effective alerts involves knowing how your systems work. Each alert should trigger in the mind of its recipient an actionable cognitive model that describes how the production environment is being threatened. How does the individual piece of infrastructure that fired this alert affect the application? Why is this alert a problem?

Only engineers who understand the systems and applications we care about have the requisite knowledge to craft alerts that describe actionable threats to those systems and applications. Therefore effective alerting requires that the recipients of alerts be able to craft those alerts.

Push Notifications as Last Resort

Emergencies force context switches. They interrupt workflow and destroy productivity. Many alerts are necessary, but very few of them should be considered emergencies. At AppOptics, the preponderance of our alerts are delivered to group chat. We find that this is a timely notification medium which doesn’t interrupt productivity. Further, group chat allows everyone to react to the alert together, in a group context, rather than individually from an email-box or pager. This helps us avoid redundant troubleshooting effort, and keeps everyone synchronized on problem resolution.

Effective alerting requires an escalation system that can communicate problems in a way that is not interrupt-driven. There are myriad examples in other industries like healthcare and security systems, where, when every alert is interrupt-driven, human beings quickly begin to ignore the alerts. Push notifications should be a last resort.

Alerting is Hard

Effective alerting is a deceptively hard problem, which represents one of the biggest challenges facing modern operations engineers. A careful balance needs to be struck between the needs of the systems and the needs of the humans tending those systems.

Read more
3 1 619
Product Manager
Product Manager

For an infrastructure to be healthy, there must be good monitoring. The team should have a monitoring infrastructure that speeds up and facilitates the verification of problems, following the line of prevention, maintenance, and correction. SolarWinds® AppOptics™ was created with the purpose of helping monitoring teams control infrastructure, including Linux monitoring.

Monitoring overview

It is critical that a technology team prepare for any situation that occurs in their environment. The purpose of monitoring is to be aware of changes in the environment so that problems can be solved with immediate action. Good monitoring history and proper perception can allow you to suggest environmental improvements according to the charts. If you have a server that displays memory usage for a certain amount of time, you can purchase more memory, or investigate the cause of the abnormal behavior before the environment becomes unavailable.

Monitoring indexes can be used for various purposes, such as application availability for a given number of users, tool deployment tracking, operating system update behavior, purchase requests, and exchanges or hardware upgrades. Each point of use depends on your deployment purpose.

Linux servers historically have operating systems that are difficult to monitor because most of the tools in the market serve other platforms. In addition, a portion of IT professionals cannot make monitoring work properly on these servers, so when a disaster occurs, it is difficult to identify what happened.

Constant monitoring of servers and services used in production is critical for company environments. Server failures in virtualization, backup, firewalls, and proxies can directly affect availability and quality of service.

The Linux operating system offers a basic monitoring system for more experienced administrators, but when it comes to monitoring, real-time reports are needed for immediate action. You cannot count on an experienced system administrator being available to access the servers, or that they can perform all existing monitoring capabilities.

In the current job market, it is important to remember that Linux specialists are rare, and their availability is limited. There are cases where an expert administrator can only act on a server when the problem has been long-standing. Training for teams to become Linux experts can be

expensive and time-consuming, with potentially low returns.

Metrics used for monitoring

  1. CPU – It is crucial to monitor CPU, as it can reach a high utilization rate and temperature. It can have multiple cores, but an application can be directed to only one of these cores, pointing to a dangerous hardware behavior.

  2. Load – This specifies whether the CPU is being used, how much is being executed, and how long it has been running.

  3. Disk Capacity and IO – Disk capacity is especially important when it comes to image servers, files, and VMs, as it can directly affect system shutdown, corrupt the operating system, or cause extreme IO slowness. Along with disk monitoring, it’s possible to plan for an eventual change or addition of a disk, and to verify the behavior of a disk that demonstrates signs of hardware failure.

  4. Network – When it comes to DNS, DHCP, firewall, file server, and proxy, it is extremely important to monitor network performance as input and output of data packets. With network performance logs, you can measure the utilization of the card, and create a plan to suit the application according to the use of the network.

  5. Memory – Memory monitoring in other components determines the immediate stop of a system due to memory overflow or misdirection for a single application.

  6. Swap – This is virtual memory created by the system and allocated to disk to be used when necessary. Its high utilization can indicate that the amount of memory for the server is insufficient.

With this information from Linux systems, you can have good monitoring and a team that can act immediately on downtime that can paralyze critical systems.

Monitoring with AppOptics

AppOptics is a real-time web monitoring tool that enables you to set up a real-time monitoring environment, create alerts by e-mail, and focus on threshold and monitoring history. You can also create monitoring levels with profiles of equipment to be monitored, and have simple monitoring viewers that can trigger a specialist or open a call for immediate action when needed.

This tool can also be an ally of an ITIL/COBIT team, which can use the reports to justify scheduled and unscheduled stops, and clarify systems that historically have problems. It can also be used to justify the purchase of new equipment, software upgrades, or the migration of a system that no longer meets the needs of a company.

AppOptics can be installed in major Linux distributions such as Red Hat, CentOS, Ubuntu, Debian, Fedora, and Amazon Linux. Its deployment is easy, fast, and practical.

Installing the AppOptics Agent on the Server

Before you start, you’ll need an account with AppOptics. If you don’t already have one, you can create a demo account which will give you 14 days to try the service, free of charge. Sign up here.

First, to allow AppOptics to aggregate the metrics from the server, you will need to install the agent on all instances. To do this, you’ll need to reference your AppOptics API token when setting up the agent. Log in to your AppOptics account and navigate to the Infrastructure page.

Locate the Add Host button, and click on it. It should look similar to the image below.

Fig. 2. AppOptics Host Agent Installation

You can follow a step-by-step guide on the Integration page, where there are Easy Install and Advanced options for users. I used an Ubuntu image in the AWS Cloud, but this will work on almost any Linux server.

Note: Prior to installation of the agent, the bottom of the dialog below will not contain the success message.

Copy the command from the first box, and then SSH into the server and run the Easy Install script.

Fig. 3. Easy Install Script to Add AppOptics Agent to a Server

When the agent installs successfully, you should be presented with the following message on your terminal. The “Confirm successful installation” box on the AppOptics agent screen should look similar to the above, with a white on blue checkbox. You should also see “Agent connected.”

Fig. 4. Installing the AppOptics Agent on your Linux Instance

After installation, you can start configuring the dashboard for monitoring on the server. Click on the hostname link in the Infrastructure page, or navigate to the Dashboards page directly, and then select the Host Agent link to view the default dashboard provided by AppOptics.

Working with the Host Agent Dashboard

The default Host Agent Dashboard provided by AppOptics offers many of the metrics discussed earlier, related to the performance of the instance itself, and should look similar to the image below.

Fig. 6. Default Host Agent Dashboard

One common pattern is to create dashboards for each location you want to monitor. Let’s use “Datacenter01” for our example. Head to Dashboards and click the Create a New Dashboard button.

You can choose the type of monitoring display (Line, Stacked, and Big Number). Then you can choose what you want to monitor as CPU Percent, Swap, or Load. In addition, within the dashboard, you can select how long you want to monitor a group of equipment or set it to be monitored indefinitely.

Fig. 8. Custom Dashboard to View Linux System Metrics

Metrics – You can select existing metrics to create new composite metrics according to what you want to be monitored in the operating system.

Alerts – Alerts are created for the operating system, including time settings for issuing a new alert and the conditions for issuing alerts.

Integrations – You can add host agent plug-ins for support for application monitoring.

Conclusion

Monitoring your Linux servers is critical as they represent the basis of your infrastructure. You need to know immediately when there is a sudden change in CPU or memory usage that could affect the performance of your applications. AppOptics has a range of ready-made tools, customizable monitoring panels, and reports that are critical for investigating chronic infrastructure problems. Learn more about AppOptics infrastructure monitoring and try it today with a free 14-day trial.

Read more
0 2 881
Product Manager
Product Manager

How do PHP logging frameworks fare when pushed to their limits? This analysis can help us decide which option is best for our PHP applications. Performance, speed, and reliability are important for logging frameworks because we want the best performance out of our application and to minimize loss of data.

Our goals for the fastest PHP framework benchmark tests are to measure the time different frameworks require to process a large number of log messages, considering various logging handlers, as well as which logging frameworks are more reliable at their limits (dropping none or less messages).

The frameworks we tried are:

  • native PHP logging (error_log and syslog built-in functions)
  • KLogger
  • Apache Log4php
  • Monolog

All of these frameworks use synchronous or “blocking” calls, as PHP functions typically do. The web server execution waits until the function/method call is finished in order to continue. As for the handlers: error_log, KLogger, Log4php, and Monolog can write log messages to text file, while error_log/syslog, Log4php, and Monolog are able to send the messages to the local system logger. Finally, only Log4php and Monolog allow remote Syslog connections.

NOTE: The term syslog can refer to various things. In this article, this includes the PHP function of the same name, the system logger daemon (e.g. syslogd), or a remote syslog server (i.e. rsyslog).

Application and Handlers

For this framework benchmark, we built a PHP CodeIgniter 3 web app with a controller for each logging mechanism. Controller methods echo the microtime difference before and after logging, which is useful for manual tests. Each controller method call has a loop that writes 10,000 INFO log messages in the case of file handlers (except error_log which can only produce E_ERROR), or 100,000 INFO messages to syslog. This helps us stress the logging system while not over-burdening the web server request handler.

NOTE: You may see the full app source code at https://github.com/jorgeorpinel/php-logging-benchmark

For the local handlers, first we tested writing to local files and kept track of the number of written logs in each test. We then tested the local system logger handler (which uses the /dev/log UNIX socket by default) and counted the number of logs syslogd wrote to /var/log/syslog.

As for the “remote” syslog server, we set up rsyslog on the system and configured it to accept both TCP and UDP logs, writing them to /var/log/messages. We recorded the number of logs there to determine whether any of them were dropped.

Benchmarking PHP Logging Frameworks A
Fig. 1 System Architecture – Each arrow represents a benchmark test.

Methodology

We ran the application locally on Ubuntu with Apache (and mod-php). First, each Controller/method was “warmed up” by requesting that URL with curl, which ensures the PHP source is already precompiled when we run the actual framework benchmark tests. Then we used ApacheBench to stress test the local web app with 100 or 10 serial requests (file or syslog, respectively). For example:

ab -ln 100 localhost:8080/Native/error_log

ab -ln 10 localhost:8080/Monolog/syslog_udp

The total number of log calls in each test was 1,000,000 (each method). We gathered performance statistics from the tool’s report for each Controller/method (refer to figure 1).

Please note in normal operations the actual drop rates should be much smaller, if any.

Hardware and OS

We ran both the sample app and the tests on AWS EC2 micro instance. It’s set up as a 64-bit Ubuntu 16.04 Linux box with an Intel(R) Xeon(R) CPU @ 2.40GHz processors and 1GiB of memory, and an 8 GB storage SSD.

Native tests

The “native” controller uses a couple of PHP built-in error handling functions. It has two methods: one that calls error_log, which is configured in php.ini to write to a file, and one that calls syslog to reach the system logger. Both functions are used with their default parameters.

error_log to file

By definition, no log messages can be lost by this method as long as the web server doesn’t fail. Its performance when writing to file will depend on the underlying file system and storage speed. Our test results:

error_log (native PHP file logger)
Requests per sec23.55 [#/sec] (mean)
Time per request42.459 [ms] (mean)
↑ Divide by 10,000 logs written per request.
NOTE: error_log can also be used to send messages to system log, among other message types.

syslog

Using error_log when error_log = syslog in php.ini, or simply using the syslog function, we can reach the system logger. This is similar to using the logger command in Linux.

syslog (native PHP system logger)
Requests per sec0.25 [#/sec] (mean)
Time per request4032.164 [ms] (mean)  ← ÷ 100,000 logs sent per request

This is typically the fastest logger, and syslogd is as robust as the web server or more, so no messages should be dropped (none were in our tests). Another advantage of the system logger is that it can be configured to write to a file and to forward logs via network.

KLogger test

KLogger is a “simple logging class for PHP” with its first stable release in 2104. It’s only able to write logs to file. Its simplicity helps its performance, however. KLogger is PSR-3 compliant: It implements the LoggerInterface.

K2Logger (simple PHP logging class)
Requests per sec14.11 [#/sec] (mean)
Time per request70.848 [ms] (mean)  ← Divide by 10,000 = 0.0070848 ms / msg
NOTE: This GitHub fork of KLogger allows local syslog usage as well. We did not try it.

Log4php tests

Log4php, first released in 2010, is one in the suite of loggers that Apache provides for several popular programming languages. Logging to file, it turns out to be a speedy contender, at least on Apache. Running the application on Apache probably helps the performance of Log4php. In local tests using PHP’s built-in server (php -S command), it was actually the slowest contender!

Log4php (Apache PHP file logger)
Requests per sec18.70 [#/sec] (mean) * 10k = 187k msg per sec
Time per request53.470 [ms] (mean) / 10k = .0053 ms / msg

As for sending to syslog, it was actually our least performant option, but not by far:

Log4php to syslog
Local syslog socketSyslog over TCP/IPSyslog over UDP/IP
0.08 ms per logAround 24 ms per log0.07 ms per log
0% dropped0% dropped0.15% dropped

Some of the advantages Log4php has, which may offset its lack of performance, are Java-like XML configuration files (same as other Apache loggers, such as the popular log4j), six logging destinations, and three message formats.

NOTE: Remote syslog over TCP however, doesn’t seem to be well supported at this time. We had to use the general-purpose LoggerAppenderSocket, which was really slow, so we only ran 100,000.

Monolog tests

Monolog, like KLogger, is a PSR-3; and, like Log4php, a full logging framework that can send logs to files, sockets, email, databases, and various web services. It was first released in 2011.

Monolog features many integrations with popular PHP frameworks, making it a popular alternative. Monolog beat its competitor Log4php in our tests, but is still not the fastest PHP framework nor most reliable of options, although probably one of the easiest for web developers.

Monolog (full PHP logging framework)
Requests per sec4.93 [#/sec] (mean) x 10k
Time per request202.742 [ms] (mean) / 10k

Monolog over Syslog:

Monolog over syslog
UNIX socketTCPUDP
0.062 ms per log0.06 ms per log0.079 ms per log
Less than 0.01% dropped0.29% dropped0% dropped

Now let’s take a look at graphs that summarize and compare all the results above. These charts show the tradeoff between using faster native or basic logging methods, more limited and lower level in nature vs. relatively less performant but full-featured frameworks:

Local File Performance Comparison

Benchmarking PHP Logging Frameworks 2
Fig 2. Time per message written to file [ms/msg]

Local Syslog Performance and Drop Rates

Log handler or “appender” names vary from framework to framework. For native PHP, we just use the syslog function (Klogger doesn’t support this); in Log4php, it’s a class called LoggerAppenderSyslog; and it’s called SyslogHandler in Monolog.

Benchmarking PHP Logging Frameworks 3
Fig 3. Time per message sent to syslogd via socket [ms/msg]

Benchmarking PHP Logging Frameworks 4
Fig 4. Drop rates to syslogd via socket [%]

Remote Syslog Performance and Drop Rates

The appenders are LoggerAppenderSocket in Log4php, SocketHandler and SyslogUdpHandler for Monolog.

To measure the drop rates, we leveraged the $RepeatedMsgReduction config param of rsyslog, which collapses identical messages into a single one and a second message with the count of further repetitions. In the case of Log4php, since the default message includes a timestamp that varies in every single log, we forwarded the logs to SolarWinds® Loggly® (syslog setup in seconds) and used a filtered, interactive log monitoring dashboard to count the total logs received.

TCP

Benchmarking PHP Logging Frameworks 6
Fig 5. Time per message sent via TCP to rsyslog

Benchmarking PHP Logging Frameworks 5
Fig 6. Drop rates to rsyslog (TCP) [%]

UDP

Benchmarking PHP Logging Frameworks 8
Fig 7. Time per message sent on UDP to rsyslog
Benchmarking PHP Logging Frameworks 7
Fig 8. Drop rates to rsyslog (UDP)

Conclusion

Each logging framework is different, and while each could be best fit to specific projects, our recommendations are as follows. Nothing beats the performance of native syslog for system admins who know their way around syslogd or syslog-ng daemons, or to forward logs to a cloud service such as Loggly. If what’s needed is a simple, yet powerful way to log locally to files, KLogger offers PSR-3 compliance and is almost as fast as native error_log, although Log4php does seem to edge it out when the app is running on Apache. For a more complete framework, Monolog seems to be the more well-rounded option, particularly when considering remote logging via TCP/IP.

After deciding on a logging framework, your next big decision is choosing a log management solution. Loggly provides unified log analysis and monitoring for all your servers in a single place. You can configure your PHP servers to forward syslog to Loggly or simply use Monolog’s LogglyHandler, which is easy to set up in your app’s code. Try Loggly for free and take control over your PHP application logs.

Read more
0 2 459
Level 10

Look back into almost any online technology businesses 10, or even 5 years ago and you’d see a clear distinction between what the CTO and CMO did in their daily roles. The former would oversee the building of technology and products whilst the latter would drive the marketing that brought in the customers to use said technology. In short, the two together took care of two very different sides of the same coin.

Marketing departments traditionally measure their success against KPIs such as the number of conversions a campaign brought in versus the cost of running it. Developers measure their performance on how quickly and effectively they develop new technologies.

Today, companies are shifting focus towards a customer-centric approach, where customer experience and satisfaction is paramount. After all, how your customers feel about your products can make, or break a business.

Performance diagnostic tools can help you optimize a slow web page but won’t show you whether your visitors are satisfied.

So where do the classic stereotypes that engineers only care about performance and marketers only care about profit fit into the customer-centric business model? The answer is they don’t: in a business where each department works against the same metrics — increasing their customers’ experience — having separate KPIs is as redundant as a trap door in a canoe.

The only KPI that matters is “are my customers happy?”

Developers + Marketing KPIs = True

With technology being integral to any online business, marketers are now in a position where we can gather so much data and in such detail that we are on the front line when it comes to gauging the satisfaction and experience of our customers. We can see what path a visitor took on our website, how long they took to complete their journey and whether they achieved what they set out to do.

Armed with this, we stand in a position to influence the technologies developers build and use.

Support teams, no longer confined to troubleshooting customer problems have become Customer Success teams, and directly impact on how developers build products, armed with first-hand data from their customers.

So as the lines blur between departments, it shouldn’t come as a surprise that engineering teams should care about marketing metrics. After all, if a product is only as effective as the people who use it, engineers build better products and websites when they know how customers intend to use them.

Collaboration is King

“How could engineers possibly make good use of marketing KPIs?” you might ask. After all, the two are responsible for separate ends of your business but can benefit from the same data.

Take a vital page on your business’s website: it’s not the fastest page on the net but its load time is consistent and it achieves its purpose: to convert your visitors to customers. Suddenly your bounce rate has shot up from 5% to 70%.

Ask an engineer to troubleshoot the issue and they might tell you that the page isn’t efficient. It takes 2.7 seconds to load, which is 0.7 seconds over the universal benchmark and what’s more is that some of the file sizes on your site are huge.

Ask a marketer the same question and they might tell you that the content is sloppy, making the purpose of the page unclear. The colors are off-brand and what’s more is that an important CTA is missing.

Even though both have been looking at the same page, they’ve come to two very different results, but the bottom line is that your customer doesn’t care about what went wrong. What matters is that the issue is identified and solved, quickly.

Unified Metrics Mean Unified Monitoring

Having unified KPIs across the various teams internal to your organisation means that they should all draw their data from the same source: a single, unified monitoring tool.

For businesses where the customer comes first, a new breed of monitoring is evolving that offers organizations this unified view, centred on how your customer experiences your site: Digital Experience Monitoring, or seeing as everything we do is digital, how about we just call it Experience Monitoring?

With Digital Experience Monitoring, your marketers and your engineering teams can follow a customer’s journey through your site, see how the navigated through it and where and why interest became a sale or a lost opportunity.

Let’s go back to our previous example: both your marketer and your engineer will see that although your bounce rate skyrocketed, the page load time and size stayed consistent. What they might also see is that onboarding you implemented that coincides with your bounce rate spike is confusing to your customers meaning that they leave, frustrated and unwilling to convert.

Digital Experience Monitoring gives a holistic view of your website’s health and helps you answer questions like:

  • Where your visitors come from
  • When are they visiting your site
  • What they visit and the journey they take to get there
  • How your site’s performance impacts on your visitors

By giving your internal teams access to the same metrics, you foster greater transparency across your organization which leads to faster resolution of issues, a deeper knowledge of your visitors and better insights into what your customers love about your products.

Pingdom’s Digital Experience Monitoring, Visitor Insights, bridges the gap between site performance and customer satisfaction, meaning you can guess less and know more about how your visitors experience your site.

Read more
1 1 386
Level 9

What are some common problems that can be detected with the handy router logs on Heroku? We’ll explore them and show you how to address them easily and quickly with monitoring of Heroku from SolarWinds Papertrail.

One of the first cloud platforms, Heroku is a popular platform as a service (PaaS) that has been in development since June 2007. It allows developers and DevOps specialists to easily deploy, run, manage, and scale applications written in Ruby, Node.js, Java, Python, Clojure, Scala, Go, and PHP.

To learn more about Heroku, head to the Heroku Architecture documentation.

Intro to Heroku Logs

Logging in Heroku is modular, similar to gathering system performance metrics. Logs are time-stamped events that can come from any of the processes running in all application containers (Dynos), system components, or backing services. Log streams are aggregated and fed into the Logplex—a high-performance, real-time system for log delivery into a single channel.

Run-time activity, as well as dyno restarts and relocations, can be seen in the application logs. This will include logs generated from within application code deployed on Heroku, services like the web server or the database, and the app’s libraries. Scaling, load, and memory usage metrics, among other structural events, can be monitored with system logs. Syslogs collect messages about actions taken by the Heroku platform infrastructure on behalf of your app. These are two of the most recurrent types of logs available on Heroku.

To fetch logs from the command line, we can use the heroku logs command. More details on this command, such as output format, filtering, or ordering logs, can be found in the Logging article of Heroku Devcenter.

$ heroku logs 2019-09-16T15:13:46.677020+00:00 app[web.1]: Processing PostController#list (for 208.39.138.12 at 2010-09-16 15:13:46) [GET] 2018-09-16T15:13:46.677902+00:00 app[web.1]: Rendering post/list 2018-09-16T15:13:46.698234+00:00 app[web.1]: Completed in 74ms (View: 31, DB: 40) | 200 OK [http://myapp.heroku.com/] 2018-09-16T15:13:46.723498+00:00 heroku[router]: at=info method=GET path='/posts' host=myapp.herokuapp.com' fwd='204.204.204.204' dyno=web.1 connect=1ms service=18ms status=200 bytes=975   # © 2018 Salesforce.com. All rights reserved.

Heroku Router Logs

Router logs are a special case of logs that exist somewhere between the app logs and the system logs—and are not fully documented on the Heroku website at the time of writing. They carry information about HTTP routing within Heroku Common Runtime, which manages dynos isolated in a single multi-tenant network. Dynos in this network can only receive connections from the routing layer. These routes are the entry and exit points of all web apps or services running on Heroku dynos.

Tail router only logs with the heroku logs -tp router CLI command.

$ heroku logs -tp router 2018-08-09T06:24:04.621068+00:00 heroku[router]: at=info method=GET path='/db' host=quiet-caverns-75347.herokuapp.com request_id=661528e0-621c-4b3e-8eef-74ca7b6c1713 fwd='104.163.156.140' dyno=web.1 connect=0ms service=17ms status=301 bytes=462 protocol=https 2018-08-09T06:24:04.902528+00:00 heroku[router]: at=info method=GET path='/db/' host=quiet-caverns-75347.herokuapp.com request_id=298914ca-d274-499b-98ed-e5db229899a8 fwd='104.163.156.140' dyno=web.1 connect=1ms service=211ms status=200 bytes=3196 protocol=https 2018-08-09T06:24:05.002308+00:00 heroku[router]: at=info method=GET path='/stylesheets/main.css' host=quiet-caverns-75347.herokuapp.com request_id=43fac3bb-12ea-4dee-b0b0-2344b58f00cf fwd='104.163.156.140' dyno=web.1 connect=0ms service=3ms status=304 bytes=128 protocol=https 2018-08-09T08:37:32.444929+00:00 heroku[router]: at=info method=GET path='/' host=quiet-caverns-75347.herokuapp.com request_id=2bd88856-8448-46eb-a5a8-cb42d73f53e4 fwd='104.163.156.140' dyno=web.1 connect=0ms service=127ms status=200 bytes=7010 protocol=https   # Fig 1. Heroku router logs in the terminal

Heroku routing logs always start with a timestamp and the “heroku[router]” source/component string, and then a specially formatted message. This message begins with either “at=info”, “at=warning”, or “at=error” (log levels), and can contain up to 14 other detailed fields such as:

  • Heroku error “code” (Optional) – For all errors and warning, and some info messages; Heroku-specific error codes that complement the HTTP status codes.
  • Error “desc” (Optional) – Description of the error, paired to the codes above
  • HTTP request “method” e.g. GET or POST – May be related to some issues
  • HTTP request “path” – URL location for the request; useful for knowing where to check on the application code
  • HTTP request “host” – Host header value
  • The Heroku HTTP Request ID – Can be used to correlate router logs to application logs;
  • HTTP request “fwd” – X-Forwarded-For header value;
  • Which “dyno” serviced the request – Useful for troubleshooting specific containers
  • “Connect” time (ms) spent establishing a connection to the web server(s)
  • “Service” time (ms) spent proxying data between the client and the web server(s)
  • HTTP response code or “status” – Quite informative in case of issues;
  • Number of “bytes” transferred in total for this web request;

Common Problems Observed with Router Logs

Examples are manually color-coded in this article. Typical ways to address the issues shown above are also provided for context.

Common HTTP Status Codes

404 Not Found Error

Problem: Error accessing nonexistent paths (regardless of HTTP method):

2018-07-30T17:10:18.998146+00:00 heroku[router]: at=info method=POST path='/saycow' host=heroku-app-log.herokuapp.com request_id=e5634f81-ec54-4a30-9767-bc22365a2610 fwd='187.220.208.152' dyno=web.1 connect=0ms service=15ms status=404 bytes=32757 protocol=https 2018-07-27T22:09:14.229118+00:00 heroku[router]: at=info method=GET path='/irobots.txt' host=heroku-app-log.herokuapp.com request_id=7a32a28b-a304-4ae3-9b1b-60ff28ac5547 fwd='187.220.208.152' dyno=web.1 connect=0ms service=31ms status=404 bytes=32769 protocol=https

Solution: Implement or change those URL paths in the application or add the missing files.

500 Server Error

Problem: There’s a bug in the application:

2018-07-31T16:56:25.885628+00:00 heroku[router]: at=info method=GET path='/' host=heroku-app-log.herokuapp.com request_id=9fb92021-6c91-4b14-9175-873bead194d9 fwd='187.220.247.218' dyno=web.1 connect=0ms service=3ms status=500 bytes=169 protocol=https

Solution: The application logs have to be examined to determine the cause of the internal error in the application’s code. Note that HTTP Request IDs can be used to correlate router logs against the web dyno logs for that same request.

Common Heroku Error Codes

Other problems commonly detected by router logs can be explored in the Heroku Error Codes. Unlike HTTP codes, these error codes are not standard and only exist in the Heroku platform. They give more specific information on what may be producing HTTP errors.

H14 – No web dynos running

Problem: App has no web dynos setup:

2018-07-30T18:34:46.027673+00:00 heroku[router]: at=error code=H14 desc='No web processes running' method=GET path='/' host=heroku-app-log.herokuapp.com request_id=b8aae23b-ff8b-40db-b2be-03464a59cf6a fwd='187.220.208.152' dyno= connect= service= status=503 bytes= protocol=https

Notice that the above case is an actual error message, which includes both Heroku error code H14 and a description. HTTP 503 means “service currently unavailable.”

Note that Heroku router error pages can be customized. These apply only to errors where the app doesn’t respond to a request e.g. 503.

Solution: Use the heroku ps:scale command to start the app’s web server(s).

H12 – Request timeout

Problem: There’s a request timeout (app takes more than 30 seconds to respond):

2018-08-18T07:11:15.487676+00:00 heroku[router]: at=error code=H12 desc='Request timeout' method=GET path='/sleep-30' host=quiet-caverns-75347.herokuapp.com request_id=1a301132-a876-42d4-b6c4-a71f4fe02d05 fwd='189.203.188.236' dyno=web.1 connect=1ms service=30001ms status=503 bytes=0 protocol=https

Error code H12 indicates the app took over 30 seconds to respond to the Heroku router.

Solution: Code that requires more than 30 seconds must run asynchronously (e.g., as a background job) in Heroku. For more info read Request Timeout in the Heroku DevCenter.

H18 – Server Request Interrupted

Problem: The Application encountered too many requests (server overload):

2018-07-31T18:52:54.071892+00:00 heroku[router]: sock=backend at=error code=H18 desc='Server Request Interrupted' method=GET path='/' host=heroku-app-log.herokuapp.com request_id=3a38b360-b9e6-4df4-a764-ef7a2ea59420 fwd='187.220.247.218' dyno=web.1 connect=0ms service=3090ms status=503 bytes= protocol=https

Solution: This problem may indicate that the application needs to be scaled up, or the app performance improved.

H80 – Maintenance mode

Problem: Maintenance mode generates an info router log with error code H18:

2018-07-30T19:07:09.539996+00:00 heroku[router]: at=info code=H80 desc='Maintenance mode' method=GET path='/' host=heroku-app-log.herokuapp.com request_id=1b126dca-1192-4e98-a70f-78317f0d6ad0 fwd='187.220.208.152' dyno= connect= service= status=503 bytes= protocol=https

Solution: Disable maintenance mode with heroku maintenance:off

Papertrail

Papertrail™ is a cloud log management service designed to aggregate Heroku app logs, text log files, and syslogs, among many others, in one place. It helps you to monitor, tail, and search logs via a web browser, command-line, or an API. The Papertrail software analyzes log messages to detect trends, and allows you to react instantly with automated alerts.

The Event Viewer is a live aggregated log tail with auto-scroll, pause, search, and other unique features. Everything in log messages is searchable, and new logs still stream in real time in the event viewer when searched (or otherwise filtered). Note that Papertrail reformats the timestamp and source in its Event Viewer to make it easier to read.

Viewer Live Pause
Fig 2. The Papertrail Event Viewer.

Provisioning Papertrail on your Heroku apps is extremely easy: heroku addons:create papertrail from terminal. (See the Papertrail article in Heroku’s DevCenter for more info.) Once setup, the add-on can be open from the Heroku app’s dashboard (Resources section) or with heroku addons:open papertrail in terminal.

Troubleshooting Routing Problems Using Papertrail

A great way to examine Heroku router logs is by using the Papertrail solution. It’s easy to isolate them in order to filter out all the noise from multiple log sources: simply click on the “heroku/router” program name in any log message, which will automatically search for “program:heroku/router” in the Event Viewer:

Heroku router viewer
Fig 3. Tail of Heroku router logs in Papertrail, 500 app error selected. © 2018 SolarWinds. All rights reserved.

Monitor HTTP 404s

How do you know that your users are finding your content, and that it’s up to date? 404 Not Found errors are what a client receives when the URL’s path is not found. Examples would be a misspelled file name or a missing app route. We want to make sure these types of errors remain uncommon, because otherwise, users are either walking to dead ends or seeing irrelevant content in the app!

With Papertrail, setting up an alert to monitor the amount of 404s returned by your app is easy and convenient. One way to do it is to search for “status=404” in the Event Viewer, and then click on the Save Search button. This will bring up the Save Search popup, along with the Save & Setup Alert option:

Save a search
Fig 4. Save a log search and set up an alert with a single action © 2018 SolarWinds. All rights reserved.

The following screen will give us the alert delivery options, such as email, Slack message, push notifications, or even publish all matching events as a custom metric for application performance management tools such as AppOptics™.

Troubleshoot 500 errors quickly

500 error on Heroku
Fig 5. HTTP 500 Internal Server Error from herokuapp.com. © 2018 Google LLC. All rights reserved.

Let’s say an HTTP 500 error is happening on your app after it’s deployed. A great feature of Papertrail is to make the request_id in log messages clickable. Simply click on it or copy it and search it in the Event Viewer to find all the app logs that are causing the internal problem, along with the detailed error message from your application’s code.

Conclusion

Heroku router logs are the glue between web traffic and (sometimes intangible) errors in your application code. It makes sense to give them special focus when monitoring a wide range of issues because they often indicate customer-facing problems that we want to avoid or address ASAP. Add the Papertrail addon to Heroku to get more powerful ways to monitor router logs.

Sign up for a 30-day free trial of Papertrail and start aggregating logs from all your Heroku apps and other sources. You may learn more about the Papertrail advanced features in its Heroku Dev Center article.

Read more
0 0 343
Product Manager
Product Manager

Page load time is inversely related to page views and conversion rates. While probably not a controversial statement, as the causality is intuitive, there is empirical data from industry leaders such as Amazon, Google, and Bing to back it in High Scalability and O’Reilly’s Radar, for example.

As web technology has become much more complex over the last decade, the issue of performance has remained a challenge as it relates to user experience. Fast forward to 2018, and UX is identified as a key requirement for business success by CIOs and CDOs.

In today’s growing ecosystem of competing web services, the undeniable reality remains that performance impacts business and it can represent a major competitive (dis)advantage. Whether your application relies on AWS, Azure, Heroku, Salesforce, Cloud Foundry, or any other SaaS platform, consider these five tips for monitoring SaaS services.

1. Realize the Importance of Monitoring

In case we haven’t established that app performance is critical for business success, let’s look at research done in the online retail sector.

“E-commerce sites must adopt a zero-tolerance policy for any performance issues that will impact customer experience [in order to remain competitive]” according to Retail Systems Research. Their conclusion is that performance management must shift from being considered an IT issue to being a business matter.

We can take this concept into more specific terms, as stated in our article series on Building a SaaS Service for an Unknown Scale. “Treat scalability and reliability as product features; this is the only way we can build a world-class SaaS application for unknown scale.”

LG ProactiveMonitoringSaaS BlogImage A
Data from Measuring the Business Impact of IT Through Application Performance (2015).

End users have come to expect very fast, real-time-like interaction with most software, regardless of the system complexities behind the scenes. This means that commercial applications and SaaS services need to be built and integrated with performance in mind at all times. And so, knowing how to measure their performance from day one is paramount. Logs extend application performance monitoring (APM) by giving you deeper insights into the causes of performance problems as well as application errors that can cause user experience problems.

2. Incorporate a Monitoring Strategy Early On

In today’s world, planning for your SaaS service’s successful adoption to take time (and thus worrying about its performance and UX later) is like selling 100 tickets to a party but only beginning preparations on the day of the event. Needless to say, such a plan is prone to produce disappointed customers, and it can even destroy a brand. Fortunately, with SaaS monitoring solutions like SolarWinds® Loggly®, it’s not time-consuming or expensive to implement monitoring.

In fact, letting scalability become a bottleneck is the first of Six Critical SaaS Engineering Mistakes to Avoid we published some time ago. We recommend defining realistic adoption goals and scenarios in early project stages, and to map them into performance, stress, and capacity testing. To realize these tests, you’ll need to be able to monitor specific app traffic, errors, user engagement, and other metrics that tech and business teams need to define together.

A good place to start is with the Four Golden Signals described by Google’s Monitoring Distributed Systems book chapter: Latency, Traffic, Errors, and Saturation. Finally, and most importantly from the business perspective, your key metrics can be used as service level indicators (SLI), which are measures of the service level provided to customers.

Based on your SLIs and adoption goals, you’ll be able to establish service level objectives (SLOs) so your ops team can target specific availability levels (uptime and performance). And, as a SaaS service provider, you should plan to offer service level agreement (SLA). SLAs are contracts with your clients that specify what happens if you fail to meet non-functional requirements, and the terms are based on your SLOs, but can be negotiated with each client, of course. SLIs, SLOs, and SLAs are the basis for successful site reliability engineering (SRE).

LG ProactiveMonitoringSaaS BlogImage B
Apache Preconfigured Dashboards in Loggly can help you watch SLOs in a single click.

For a seamless understanding among tech and business leadership, key performance indicators (KPI) should be identified for various business stakeholders. KPIs should then be mapped to the performance metrics that compose each SLA (so they can be monitored). Defining a matrix of KPI vs. metrics vs. area of business impact as part of the business documentation is a good option. For example, a web conversion rate could map to page load time and number of outages, and impacts sales.

Finally, don’t forget to consider and plan for governance: roles and responsibilities around information (e.g., ownership, prioritization, and escalation rules). The RACI model can help you establish a clear matrix of which team is responsible, accountable, consulted, and informed when there are unplanned events emanating from or affecting business technology.

3. Have Application Logging as a Code Standard

Tech leadership should realize that the main function of logging begins after the initial development is complete. Good logging serves multiple purposes:

  1. Improving debugging during development iterations
  2. Providing visibility for tuning and optimizing complex processes
  3. Understanding and addressing failures of production systems
  4. Business intelligence

“The best SaaS companies are engineered to be data-driven, and there’s no better place to start than leveraging data in your logs.” (From the last of our SaaS Engineering Mistakes)

Best practices for logging is a topic that’s been widely written about. For example, see our article on best practices for creating logs. Here are a few guidelines from that and other sources:

  • Define logging goals and criteria to decide what to log. (Logging absolutely everything produces noise and is needlessly expensive.)
  • Log messages should contain data, context, and description. They need to be digestible (structured in a way that both humans and machines can read them).
  • Ensure that log messages are appropriate in severity using standard levels such as FATAL, ERROR, WARN, INFO, DEBUG, TRACE (See also Syslog facilities and levels).
  • Avoid side effects on the code execution. Particularly, don’t let logging halt your app by using non-blocking calls.
  • External systems: try logging all data that comes out from your application and gets in.
  • Use a standard log message format with clear key-value pairs and/or consider a known text standard format like JSON. (See figure 4 below.)
  • Support distributed logging: Centralize logs to a shareable, searchable platform such as Loggly.

Some of our sources include:

LG ProactiveMonitoringSaaS BlogImage C
Loggly automatically parses several log formats you can navigate with the Fields Explorer.

Every stage in the software development life cycle can be enriched by logs and other metrics. Implementation, integration, staging, and production deployment (especially rolling deploys) will particularly benefit from monitoring such metrics appropriately.

Logs constitute valuable data for your tech team, and invaluable data for your business. Now that you have rich information about the app that is generated in real-time, think about ways to put it in good use.

4. Automate Your Monitoring Configuration

Modern applications are deployed using infrastructure as code (IaC) techniques because they replace fragile server configuration with systems that can be easily torn down and restarted. If your team has made undocumented changes to servers and are too scared to shut them down, they are essentially “pet” servers.

If you manually deploy monitoring configuration on a per-server basis, then you have the potential to lose visibility when servers stop or when you add new ones. If you treat monitoring as something to be automatically deployed and configured, then you’ll get better coverage for less effort in the long run. This becomes even more important when testing new versions of your infrastructure or code, and when recovering from outages. Tools like Terraform, Ansible, Puppet, and CloudFormation can automate not just the deployment of your application but the monitoring of it as well.

Monitoring tools typically have system agents that can be installed on your infrastructure to begin streaming metrics into their service. In the case of applications built on SaaS platforms, there are convenient integrations that plug into well-known ecosystems. For example, Loggly streams and centralizes logs as metrics, and supports dozens of out-of-box systems, including the Amazon Cloudwatch and Heroku PaaS platforms.

5. Use Alerts on Your Key Metrics

Monitoring solutions like Loggly can alert you in changes in your SLIs over time, such as your error rate. It can help you visually identify the types of errors that occur and when they start. This will help identify root causes and fix problems faster, minimizing impact to user experience.

LG ProactiveMonitoringSaaS BlogImage D
Loggly Chart of application errors split by errorCode.

Custom alerts can be created from saved log searches, which act as key metrics of your application’s performance. Loggly even lets you integrate alerts to incident management systems like PagerDuty and OpsGenie.

LG ProactiveMonitoringSaaS BlogImage E
Adding an alert from a Syslog error log search in Loggly.

In conclusion, monitoring your SaaS service performance is very important because it significantly impacts your business’ bottom line. This monitoring has to be planned for, applied early on, and instrumented for all the stages in the SDLC.

Additionally, we explained how and why correct logging is one of the best sources for key metrics to measure your monitoring goals during development and production of your SaaS service. Proper logging on an easy-to-use platform such as Loggly will also help your business harness invaluable intel in real time. You can leverage these streams of information for tuning your app, improving your service, and to discover new revenue models.

Sign up for a free 14-day trial of SolarWinds Loggly to start doing logging right today, and move your SaaS business into the next level of performance control and business intelligence.

Read more
0 0 284