1 2 3 4 Previous Next

Monitoring Central

48 posts

How do PHP logging frameworks fare when pushed to their limits? This analysis can help us decide which option is best for our PHP applications. Performance, speed, and reliability are important for logging frameworks because we want the best performance out of our application and to minimize loss of data.

Our goals for the fastest PHP framework benchmark tests are to measure the time different frameworks require to process a large number of log messages, considering various logging handlers, as well as which logging frameworks are more reliable at their limits (dropping none or less messages).

The frameworks we tried are:

  • native PHP logging (error_log and syslog built-in functions)
  • KLogger
  • Apache Log4php
  • Monolog

All of these frameworks use synchronous or “blocking” calls, as PHP functions typically do. The web server execution waits until the function/method call is finished in order to continue. As for the handlers: error_log, KLogger, Log4php, and Monolog can write log messages to text file, while error_log/syslog, Log4php, and Monolog are able to send the messages to the local system logger. Finally, only Log4php and Monolog allow remote Syslog connections.

NOTE: The term syslog can refer to various things. In this article, this includes the PHP function of the same name, the system logger daemon (e.g. syslogd), or a remote syslog server (i.e. rsyslog).

Application and Handlers

For this framework benchmark, we built a PHP CodeIgniter 3 web app with a controller for each logging mechanism. Controller methods echo the microtime difference before and after logging, which is useful for manual tests. Each controller method call has a loop that writes 10,000 INFO log messages in the case of file handlers (except error_log which can only produce E_ERROR), or 100,000 INFO messages to syslog. This helps us stress the logging system while not over-burdening the web server request handler.

NOTE: You may see the full app source code at https://github.com/jorgeorpinel/php-logging-benchmark

 

For the local handlers, first we tested writing to local files and kept track of the number of written logs in each test. We then tested the local system logger handler (which uses the /dev/log UNIX socket by default) and counted the number of logs syslogd wrote to /var/log/syslog.

As for the “remote” syslog server, we set up rsyslog on the system and configured it to accept both TCP and UDP logs, writing them to /var/log/messages. We recorded the number of logs there to determine whether any of them were dropped.

Benchmarking PHP Logging Frameworks A
Fig. 1 System Architecture – Each arrow represents a benchmark test.

Methodology

We ran the application locally on Ubuntu with Apache (and mod-php). First, each Controller/method was “warmed up” by requesting that URL with curl, which ensures the PHP source is already precompiled when we run the actual framework benchmark tests. Then we used ApacheBench to stress test the local web app with 100 or 10 serial requests (file or syslog, respectively). For example:

ab -ln 100 localhost:8080/Native/error_log

ab -ln 10 localhost:8080/Monolog/syslog_udp

The total number of log calls in each test was 1,000,000 (each method). We gathered performance statistics from the tool’s report for each Controller/method (refer to figure 1).

Please note in normal operations the actual drop rates should be much smaller, if any.

Hardware and OS

We ran both the sample app and the tests on AWS EC2 micro instance. It’s set up as a 64-bit Ubuntu 16.04 Linux box with an Intel(R) Xeon(R) CPU @ 2.40GHz processors and 1GiB of memory, and an 8 GB storage SSD.

Native tests

The “native” controller uses a couple of PHP built-in error handling functions. It has two methods: one that calls error_log, which is configured in php.ini to write to a file, and one that calls syslog to reach the system logger. Both functions are used with their default parameters.

error_log to file

By definition, no log messages can be lost by this method as long as the web server doesn’t fail. Its performance when writing to file will depend on the underlying file system and storage speed. Our test results:

error_log (native PHP file logger)
Requests per sec23.55 [#/sec] (mean)
Time per request42.459 [ms] (mean)
↑ Divide by 10,000 logs written per request.
NOTE: error_log can also be used to send messages to system log, among other message types.

syslog

Using error_log when error_log = syslog in php.ini, or simply using the syslog function, we can reach the system logger. This is similar to using the logger command in Linux.

syslog (native PHP system logger)
Requests per sec0.25 [#/sec] (mean)
Time per request4032.164 [ms] (mean)  ← ÷ 100,000 logs sent per request

This is typically the fastest logger, and syslogd is as robust as the web server or more, so no messages should be dropped (none were in our tests). Another advantage of the system logger is that it can be configured to write to a file and to forward logs via network.

KLogger test

KLogger is a “simple logging class for PHP” with its first stable release in 2104. It’s only able to write logs to file. Its simplicity helps its performance, however. KLogger is PSR-3 compliant: It implements the LoggerInterface.

K2Logger (simple PHP logging class)
Requests per sec14.11 [#/sec] (mean)
Time per request70.848 [ms] (mean)  ← Divide by 10,000 = 0.0070848 ms / msg
NOTE: This GitHub fork of KLogger allows local syslog usage as well. We did not try it.

Log4php tests

Log4php, first released in 2010, is one in the suite of loggers that Apache provides for several popular programming languages. Logging to file, it turns out to be a speedy contender, at least on Apache. Running the application on Apache probably helps the performance of Log4php. In local tests using PHP’s built-in server (php -S command), it was actually the slowest contender!

Log4php (Apache PHP file logger)
Requests per sec18.70 [#/sec] (mean) * 10k = 187k msg per sec
Time per request53.470 [ms] (mean) / 10k = .0053 ms / msg

As for sending to syslog, it was actually our least performant option, but not by far:

Log4php to syslog
Local syslog socketSyslog over TCP/IPSyslog over UDP/IP
0.08 ms per logAround 24 ms per log0.07 ms per log
0% dropped0% dropped0.15% dropped

Some of the advantages Log4php has, which may offset its lack of performance, are Java-like XML configuration files (same as other Apache loggers, such as the popular log4j), six logging destinations, and three message formats.

NOTE: Remote syslog over TCP however, doesn’t seem to be well supported at this time. We had to use the general-purpose LoggerAppenderSocket, which was really slow, so we only ran 100,000.

Monolog tests

Monolog, like KLogger, is a PSR-3; and, like Log4php, a full logging framework that can send logs to files, sockets, email, databases, and various web services. It was first released in 2011.

Monolog features many integrations with popular PHP frameworks, making it a popular alternative. Monolog beat its competitor Log4php in our tests, but is still not the fastest PHP framework nor most reliable of options, although probably one of the easiest for web developers.

Monolog (full PHP logging framework)
Requests per sec4.93 [#/sec] (mean) x 10k
Time per request202.742 [ms] (mean) / 10k

Monolog over Syslog:

Monolog over syslog
UNIX socketTCPUDP
0.062 ms per log0.06 ms per log0.079 ms per log
Less than 0.01% dropped0.29% dropped0% dropped

Now let’s take a look at graphs that summarize and compare all the results above. These charts show the tradeoff between using faster native or basic logging methods, more limited and lower level in nature vs. relatively less performant but full-featured frameworks:

Local File Performance Comparison

Benchmarking PHP Logging Frameworks 2
Fig 2. Time per message written to file [ms/msg]

Local Syslog Performance and Drop Rates

Log handler or “appender” names vary from framework to framework. For native PHP, we just use the syslog function (Klogger doesn’t support this); in Log4php, it’s a class called LoggerAppenderSyslog; and it’s called SyslogHandler in Monolog.

Benchmarking PHP Logging Frameworks 3
Fig 3. Time per message sent to syslogd via socket [ms/msg]

Benchmarking PHP Logging Frameworks 4
Fig 4. Drop rates to syslogd via socket [%]

 

Remote Syslog Performance and Drop Rates

The appenders are LoggerAppenderSocket in Log4php, SocketHandler and SyslogUdpHandler for Monolog.

To measure the drop rates, we leveraged the $RepeatedMsgReduction config param of rsyslog, which collapses identical messages into a single one and a second message with the count of further repetitions. In the case of Log4php, since the default message includes a timestamp that varies in every single log, we forwarded the logs to SolarWinds® Loggly® (syslog setup in seconds) and used a filtered, interactive log monitoring dashboard to count the total logs received.

TCP

Benchmarking PHP Logging Frameworks 6
Fig 5. Time per message sent via TCP to rsyslog

Benchmarking PHP Logging Frameworks 5
Fig 6. Drop rates to rsyslog (TCP) [%]

UDP

Benchmarking PHP Logging Frameworks 8
Fig 7. Time per message sent on UDP to rsyslog
Benchmarking PHP Logging Frameworks 7
Fig 8. Drop rates to rsyslog (UDP)

Conclusion

Each logging framework is different, and while each could be best fit to specific projects, our recommendations are as follows. Nothing beats the performance of native syslog for system admins who know their way around syslogd or syslog-ng daemons, or to forward logs to a cloud service such as Loggly. If what’s needed is a simple, yet powerful way to log locally to files, KLogger offers PSR-3 compliance and is almost as fast as native error_log, although Log4php does seem to edge it out when the app is running on Apache. For a more complete framework, Monolog seems to be the more well-rounded option, particularly when considering remote logging via TCP/IP.

 

After deciding on a logging framework, your next big decision is choosing a log management solution. Loggly provides unified log analysis and monitoring for all your servers in a single place. You can configure your PHP servers to forward syslog to Loggly or simply use Monolog’s LogglyHandler, which is easy to set up in your app’s code. Try Loggly for free and take control over your PHP application logs.

What are some common problems that can be detected with the handy router logs on Heroku? We’ll explore them and show you how to address them easily and quickly with monitoring of Heroku from SolarWinds Papertrail.

 

One of the first cloud platforms, Heroku is a popular platform as a service (PaaS) that has been in development since June 2007. It allows developers and DevOps specialists to easily deploy, run, manage, and scale applications written in Ruby, Node.js, Java, Python, Clojure, Scala, Go, and PHP.

 

To learn more about Heroku, head to the Heroku Architecture documentation.

 

Intro to Heroku Logs

Logging in Heroku is modular, similar to gathering system performance metrics. Logs are time-stamped events that can come from any of the processes running in all application containers (Dynos), system components, or backing services. Log streams are aggregated and fed into the Logplex—a high-performance, real-time system for log delivery into a single channel.

 

Run-time activity, as well as dyno restarts and relocations, can be seen in the application logs. This will include logs generated from within application code deployed on Heroku, services like the web server or the database, and the app’s libraries. Scaling, load, and memory usage metrics, among other structural events, can be monitored with system logs. Syslogs collect messages about actions taken by the Heroku platform infrastructure on behalf of your app. These are two of the most recurrent types of logs available on Heroku.

 

To fetch logs from the command line, we can use the heroku logs command. More details on this command, such as output format, filtering, or ordering logs, can be found in the Logging article of Heroku Devcenter.

$ heroku logs 2019-09-16T15:13:46.677020+00:00 app[web.1]: Processing PostController#list (for 208.39.138.12 at 2010-09-16 15:13:46) [GET] 2018-09-16T15:13:46.677902+00:00 app[web.1]: Rendering post/list 2018-09-16T15:13:46.698234+00:00 app[web.1]: Completed in 74ms (View: 31, DB: 40) | 200 OK [http://myapp.heroku.com/] 2018-09-16T15:13:46.723498+00:00 heroku[router]: at=info method=GET path='/posts' host=myapp.herokuapp.com' fwd='204.204.204.204' dyno=web.1 connect=1ms service=18ms status=200 bytes=975   # © 2018 Salesforce.com. All rights reserved.

Heroku Router Logs

Router logs are a special case of logs that exist somewhere between the app logs and the system logs—and are not fully documented on the Heroku website at the time of writing. They carry information about HTTP routing within Heroku Common Runtime, which manages dynos isolated in a single multi-tenant network. Dynos in this network can only receive connections from the routing layer. These routes are the entry and exit points of all web apps or services running on Heroku dynos.

 

Tail router only logs with the heroku logs -tp router CLI command.

$ heroku logs -tp router 2018-08-09T06:24:04.621068+00:00 heroku[router]: at=info method=GET path='/db' host=quiet-caverns-75347.herokuapp.com request_id=661528e0-621c-4b3e-8eef-74ca7b6c1713 fwd='104.163.156.140' dyno=web.1 connect=0ms service=17ms status=301 bytes=462 protocol=https 2018-08-09T06:24:04.902528+00:00 heroku[router]: at=info method=GET path='/db/' host=quiet-caverns-75347.herokuapp.com request_id=298914ca-d274-499b-98ed-e5db229899a8 fwd='104.163.156.140' dyno=web.1 connect=1ms service=211ms status=200 bytes=3196 protocol=https 2018-08-09T06:24:05.002308+00:00 heroku[router]: at=info method=GET path='/stylesheets/main.css' host=quiet-caverns-75347.herokuapp.com request_id=43fac3bb-12ea-4dee-b0b0-2344b58f00cf fwd='104.163.156.140' dyno=web.1 connect=0ms service=3ms status=304 bytes=128 protocol=https 2018-08-09T08:37:32.444929+00:00 heroku[router]: at=info method=GET path='/' host=quiet-caverns-75347.herokuapp.com request_id=2bd88856-8448-46eb-a5a8-cb42d73f53e4 fwd='104.163.156.140' dyno=web.1 connect=0ms service=127ms status=200 bytes=7010 protocol=https   # Fig 1. Heroku router logs in the terminal

Heroku routing logs always start with a timestamp and the “heroku[router]” source/component string, and then a specially formatted message. This message begins with either “at=info”, “at=warning”, or “at=error” (log levels), and can contain up to 14 other detailed fields such as:

  • Heroku error “code” (Optional) – For all errors and warning, and some info messages; Heroku-specific error codes that complement the HTTP status codes.
  • Error “desc” (Optional) – Description of the error, paired to the codes above
  • HTTP request “method” e.g. GET or POST – May be related to some issues
  • HTTP request “path” – URL location for the request; useful for knowing where to check on the application code
  • HTTP request “host” – Host header value
  • The Heroku HTTP Request ID – Can be used to correlate router logs to application logs;
  • HTTP request “fwd” – X-Forwarded-For header value;
  • Which “dyno” serviced the request – Useful for troubleshooting specific containers
  • “Connect” time (ms) spent establishing a connection to the web server(s)
  • “Service” time (ms) spent proxying data between the client and the web server(s)
  • HTTP response code or “status” – Quite informative in case of issues;
  • Number of “bytes” transferred in total for this web request;

 

Common Problems Observed with Router Logs

Examples are manually color-coded in this article. Typical ways to address the issues shown above are also provided for context.

 

Common HTTP Status Codes

404 Not Found Error

Problem: Error accessing nonexistent paths (regardless of HTTP method):

2018-07-30T17:10:18.998146+00:00 heroku[router]: at=info method=POST path='/saycow' host=heroku-app-log.herokuapp.com request_id=e5634f81-ec54-4a30-9767-bc22365a2610 fwd='187.220.208.152' dyno=web.1 connect=0ms service=15ms status=404 bytes=32757 protocol=https 2018-07-27T22:09:14.229118+00:00 heroku[router]: at=info method=GET path='/irobots.txt' host=heroku-app-log.herokuapp.com request_id=7a32a28b-a304-4ae3-9b1b-60ff28ac5547 fwd='187.220.208.152' dyno=web.1 connect=0ms service=31ms status=404 bytes=32769 protocol=https

Solution: Implement or change those URL paths in the application or add the missing files.

500 Server Error

Problem: There’s a bug in the application:

2018-07-31T16:56:25.885628+00:00 heroku[router]: at=info method=GET path='/' host=heroku-app-log.herokuapp.com request_id=9fb92021-6c91-4b14-9175-873bead194d9 fwd='187.220.247.218' dyno=web.1 connect=0ms service=3ms status=500 bytes=169 protocol=https

Solution: The application logs have to be examined to determine the cause of the internal error in the application’s code. Note that HTTP Request IDs can be used to correlate router logs against the web dyno logs for that same request.

Common Heroku Error Codes

 

Other problems commonly detected by router logs can be explored in the Heroku Error Codes. Unlike HTTP codes, these error codes are not standard and only exist in the Heroku platform. They give more specific information on what may be producing HTTP errors.

H14 – No web dynos running

Problem: App has no web dynos setup:

2018-07-30T18:34:46.027673+00:00 heroku[router]: at=error code=H14 desc='No web processes running' method=GET path='/' host=heroku-app-log.herokuapp.com request_id=b8aae23b-ff8b-40db-b2be-03464a59cf6a fwd='187.220.208.152' dyno= connect= service= status=503 bytes= protocol=https

Notice that the above case is an actual error message, which includes both Heroku error code H14 and a description. HTTP 503 means “service currently unavailable.”

Note that Heroku router error pages can be customized. These apply only to errors where the app doesn’t respond to a request e.g. 503.

Solution: Use the heroku ps:scale command to start the app’s web server(s).

 

H12 – Request timeout

Problem: There’s a request timeout (app takes more than 30 seconds to respond):

2018-08-18T07:11:15.487676+00:00 heroku[router]: at=error code=H12 desc='Request timeout' method=GET path='/sleep-30' host=quiet-caverns-75347.herokuapp.com request_id=1a301132-a876-42d4-b6c4-a71f4fe02d05 fwd='189.203.188.236' dyno=web.1 connect=1ms service=30001ms status=503 bytes=0 protocol=https

Error code H12 indicates the app took over 30 seconds to respond to the Heroku router.

Solution: Code that requires more than 30 seconds must run asynchronously (e.g., as a background job) in Heroku. For more info read Request Timeout in the Heroku DevCenter.

H18 – Server Request Interrupted

Problem: The Application encountered too many requests (server overload):

2018-07-31T18:52:54.071892+00:00 heroku[router]: sock=backend at=error code=H18 desc='Server Request Interrupted' method=GET path='/' host=heroku-app-log.herokuapp.com request_id=3a38b360-b9e6-4df4-a764-ef7a2ea59420 fwd='187.220.247.218' dyno=web.1 connect=0ms service=3090ms status=503 bytes= protocol=https

Solution: This problem may indicate that the application needs to be scaled up, or the app performance improved.

H80 – Maintenance mode

Problem: Maintenance mode generates an info router log with error code H18:

2018-07-30T19:07:09.539996+00:00 heroku[router]: at=info code=H80 desc='Maintenance mode' method=GET path='/' host=heroku-app-log.herokuapp.com request_id=1b126dca-1192-4e98-a70f-78317f0d6ad0 fwd='187.220.208.152' dyno= connect= service= status=503 bytes= protocol=https

Solution: Disable maintenance mode with heroku maintenance:off

 

Papertrail

Papertrail™ is a cloud log management service designed to aggregate Heroku app logs, text log files, and syslogs, among many others, in one place. It helps you to monitor, tail, and search logs via a web browser, command-line, or an API. The Papertrail software analyzes log messages to detect trends, and allows you to react instantly with automated alerts.

 

The Event Viewer is a live aggregated log tail with auto-scroll, pause, search, and other unique features. Everything in log messages is searchable, and new logs still stream in real time in the event viewer when searched (or otherwise filtered). Note that Papertrail reformats the timestamp and source in its Event Viewer to make it easier to read.

Viewer Live Pause
Fig 2. The Papertrail Event Viewer. © 2018 Solarwinds. All rights reserved.

Provisioning Papertrail on your Heroku apps is extremely easy: heroku addons:create papertrail from terminal. (See the Papertrail article in Heroku’s DevCenter for more info.) Once setup, the add-on can be open from the Heroku app’s dashboard (Resources section) or with heroku addons:open papertrail in terminal.

 

Troubleshooting Routing Problems Using Papertrail

A great way to examine Heroku router logs is by using the Papertrail solution. It’s easy to isolate them in order to filter out all the noise from multiple log sources: simply click on the “heroku/router” program name in any log message, which will automatically search for “program:heroku/router” in the Event Viewer:

Heroku router viewer
Fig 3. Tail of Heroku router logs in Papertrail, 500 app error selected. © 2018 SolarWinds. All rights reserved.

 

Monitor HTTP 404s

How do you know that your users are finding your content, and that it’s up to date? 404 Not Found errors are what a client receives when the URL’s path is not found. Examples would be a misspelled file name or a missing app route. We want to make sure these types of errors remain uncommon, because otherwise, users are either walking to dead ends or seeing irrelevant content in the app!

 

With Papertrail, setting up an alert to monitor the amount of 404s returned by your app is easy and convenient. One way to do it is to search for “status=404” in the Event Viewer, and then click on the Save Search button. This will bring up the Save Search popup, along with the Save & Setup Alert option:

Save a search
Fig 4. Save a log search and set up an alert with a single action © 2018 SolarWinds. All rights reserved.

 

The following screen will give us the alert delivery options, such as email, Slack message, push notifications, or even publish all matching events as a custom metric for application performance management tools such as AppOptics™.

Troubleshoot 500 errors quickly

500 error on Heroku
Fig 5. HTTP 500 Internal Server Error from herokuapp.com. © 2018 Google LLC. All rights reserved.

 

Let’s say an HTTP 500 error is happening on your app after it’s deployed. A great feature of Papertrail is to make the request_id in log messages clickable. Simply click on it or copy it and search it in the Event Viewer to find all the app logs that are causing the internal problem, along with the detailed error message from your application’s code.

 

Conclusion

Heroku router logs are the glue between web traffic and (sometimes intangible) errors in your application code. It makes sense to give them special focus when monitoring a wide range of issues because they often indicate customer-facing problems that we want to avoid or address ASAP. Add the Papertrail addon to Heroku to get more powerful ways to monitor router logs.

 

Sign up for a 30-day free trial of Papertrail and start aggregating logs from all your Heroku apps and other sources. You may learn more about the Papertrail advanced features in its Heroku Dev Center article.

Look back into almost any online technology businesses 10, or even 5 years ago and you’d see a clear distinction between what the CTO and CMO did in their daily roles. The former would oversee the building of technology and products whilst the latter would drive the marketing that brought in the customers to use said technology. In short, the two together took care of two very different sides of the same coin.

 

Marketing departments traditionally measure their success against KPIs such as the number of conversions a campaign brought in versus the cost of running it. Developers measure their performance on how quickly and effectively they develop new technologies.

 

Today, companies are shifting focus towards a customer-centric approach, where customer experience and satisfaction is paramount. After all, how your customers feel about your products can make, or break a business.

Performance diagnostic tools can help you optimize a slow web page but won’t show you whether your visitors are satisfied.

So where do the classic stereotypes that engineers only care about performance and marketers only care about profit fit into the customer-centric business model? The answer is they don’t: in a business where each department works against the same metrics — increasing their customers’ experience — having separate KPIs is as redundant as a trap door in a canoe.

 

The only KPI that matters is “are my customers happy?”

 

Developers + Marketing KPIs = True

With technology being integral to any online business, marketers are now in a position where we can gather so much data and in such detail that we are on the front line when it comes to gauging the satisfaction and experience of our customers. We can see what path a visitor took on our website, how long they took to complete their journey and whether they achieved what they set out to do.

 

Armed with this, we stand in a position to influence the technologies developers build and use.

 

Support teams, no longer confined to troubleshooting customer problems have become Customer Success teams, and directly impact on how developers build products, armed with first-hand data from their customers.

 

So as the lines blur between departments, it shouldn’t come as a surprise that engineering teams should care about marketing metrics. After all, if a product is only as effective as the people who use it, engineers build better products and websites when they know how customers intend to use them.

 

Collaboration is King

“How could engineers possibly make good use of marketing KPIs?” you might ask. After all, the two are responsible for separate ends of your business but can benefit from the same data.

 

Take a vital page on your business’s website: it’s not the fastest page on the net but its load time is consistent and it achieves its purpose: to convert your visitors to customers. Suddenly your bounce rate has shot up from 5% to 70%.

Ask an engineer to troubleshoot the issue and they might tell you that the page isn’t efficient. It takes 2.7 seconds to load, which is 0.7 seconds over the universal benchmark and what’s more is that some of the file sizes on your site are huge.

 

Ask a marketer the same question and they might tell you that the content is sloppy, making the purpose of the page unclear. The colors are off-brand and what’s more is that an important CTA is missing.

 

Even though both have been looking at the same page, they’ve come to two very different results, but the bottom line is that your customer doesn’t care about what went wrong. What matters is that the issue is identified and solved, quickly.

 

Unified Metrics Mean Unified Monitoring

Having unified KPIs across the various teams internal to your organisation means that they should all draw their data from the same source: a single, unified monitoring tool.

 

For businesses where the customer comes first, a new breed of monitoring is evolving that offers organizations this unified view, centred on how your customer experiences your site: Digital Experience Monitoring, or seeing as everything we do is digital, how about we just call it Experience Monitoring?

With Digital Experience Monitoring, your marketers and your engineering teams can follow a customer’s journey through your site, see how the navigated through it and where and why interest became a sale or a lost opportunity.

 

Let’s go back to our previous example: both your marketer and your engineer will see that although your bounce rate skyrocketed, the page load time and size stayed consistent. What they might also see is that onboarding you implemented that coincides with your bounce rate spike is confusing to your customers meaning that they leave, frustrated and unwilling to convert.

 

Digital Experience Monitoring gives a holistic view of your website’s health and helps you answer questions like:

  • Where your visitors come from
  • When are they visiting your site
  • What they visit and the journey they take to get there
  • How your site’s performance impacts on your visitors

By giving your internal teams access to the same metrics, you foster greater transparency across your organization which leads to faster resolution of issues, a deeper knowledge of your visitors and better insights into what your customers love about your products.

 

Pingdom’s Digital Experience Monitoring, Visitor Insights, bridges the gap between site performance and customer satisfaction, meaning you can guess less and know more about how your visitors experience your site.

Page load time is inversely related to page views and conversion rates. While probably not a controversial statement, as the causality is intuitive, there is empirical data from industry leaders such as Amazon, Google, and Bing to back it in High Scalability and O’Reilly’s Radar, for example.

 

As web technology has become much more complex over the last decade, the issue of performance has remained a challenge as it relates to user experience. Fast forward to 2018, and UX is identified as a key requirement for business success by CIOs and CDOs.

 

In today’s growing ecosystem of competing web services, the undeniable reality remains that performance impacts business and it can represent a major competitive (dis)advantage. Whether your application relies on AWS, Azure, Heroku, Salesforce, Cloud Foundry, or any other SaaS platform, consider these five tips for monitoring SaaS services.

 

1. Realize the Importance of Monitoring

In case we haven’t established that app performance is critical for business success, let’s look at research done in the online retail sector.

 

“E-commerce sites must adopt a zero-tolerance policy for any performance issues that will impact customer experience [in order to remain competitive]” according to Retail Systems Research. Their conclusion is that performance management must shift from being considered an IT issue to being a business matter.

 

We can take this concept into more specific terms, as stated in our article series on Building a SaaS Service for an Unknown Scale. “Treat scalability and reliability as product features; this is the only way we can build a world-class SaaS application for unknown scale.”

LG ProactiveMonitoringSaaS BlogImage A
Data from Measuring the Business Impact of IT Through Application Performance (2015).

 

End users have come to expect very fast, real-time-like interaction with most software, regardless of the system complexities behind the scenes. This means that commercial applications and SaaS services need to be built and integrated with performance in mind at all times. And so, knowing how to measure their performance from day one is paramount. Logs extend application performance monitoring (APM) by giving you deeper insights into the causes of performance problems as well as application errors that can cause user experience problems.

 

2. Incorporate a Monitoring Strategy Early On

In today’s world, planning for your SaaS service’s successful adoption to take time (and thus worrying about its performance and UX later) is like selling 100 tickets to a party but only beginning preparations on the day of the event. Needless to say, such a plan is prone to produce disappointed customers, and it can even destroy a brand. Fortunately, with SaaS monitoring solutions like SolarWinds® Loggly®, it’s not time-consuming or expensive to implement monitoring.

 

In fact, letting scalability become a bottleneck is the first of Six Critical SaaS Engineering Mistakes to Avoid we published some time ago. We recommend defining realistic adoption goals and scenarios in early project stages, and to map them into performance, stress, and capacity testing. To realize these tests, you’ll need to be able to monitor specific app traffic, errors, user engagement, and other metrics that tech and business teams need to define together.

 

A good place to start is with the Four Golden Signals described by Google’s Monitoring Distributed Systems book chapter: Latency, Traffic, Errors, and Saturation. Finally, and most importantly from the business perspective, your key metrics can be used as service level indicators (SLI), which are measures of the service level provided to customers.

 

Based on your SLIs and adoption goals, you’ll be able to establish service level objectives (SLOs) so your ops team can target specific availability levels (uptime and performance). And, as a SaaS service provider, you should plan to offer service level agreement (SLA). SLAs are contracts with your clients that specify what happens if you fail to meet non-functional requirements, and the terms are based on your SLOs, but can be negotiated with each client, of course. SLIs, SLOs, and SLAs are the basis for successful site reliability engineering (SRE).

LG ProactiveMonitoringSaaS BlogImage B
Apache Preconfigured Dashboards in Loggly can help you watch SLOs in a single click.

 

For a seamless understanding among tech and business leadership, key performance indicators (KPI) should be identified for various business stakeholders. KPIs should then be mapped to the performance metrics that compose each SLA (so they can be monitored). Defining a matrix of KPI vs. metrics vs. area of business impact as part of the business documentation is a good option. For example, a web conversion rate could map to page load time and number of outages, and impacts sales.

 

Finally, don’t forget to consider and plan for governance: roles and responsibilities around information (e.g., ownership, prioritization, and escalation rules). The RACI model can help you establish a clear matrix of which team is responsible, accountable, consulted, and informed when there are unplanned events emanating from or affecting business technology.

 

3. Have Application Logging as a Code Standard

Tech leadership should realize that the main function of logging begins after the initial development is complete. Good logging serves multiple purposes:

  1. Improving debugging during development iterations
  2. Providing visibility for tuning and optimizing complex processes
  3. Understanding and addressing failures of production systems
  4. Business intelligence

“The best SaaS companies are engineered to be data-driven, and there’s no better place to start than leveraging data in your logs.” (From the last of our SaaS Engineering Mistakes)

 

Best practices for logging is a topic that’s been widely written about. For example, see our article on best practices for creating logs. Here are a few guidelines from that and other sources:

  • Define logging goals and criteria to decide what to log. (Logging absolutely everything produces noise and is needlessly expensive.)
  • Log messages should contain data, context, and description. They need to be digestible (structured in a way that both humans and machines can read them).
  • Ensure that log messages are appropriate in severity using standard levels such as FATAL, ERROR, WARN, INFO, DEBUG, TRACE (See also Syslog facilities and levels).
  • Avoid side effects on the code execution. Particularly, don’t let logging halt your app by using non-blocking calls.
  • External systems: try logging all data that comes out from your application and gets in.
  • Use a standard log message format with clear key-value pairs and/or consider a known text standard format like JSON. (See figure 4 below.)
  • Support distributed logging: Centralize logs to a shareable, searchable platform such as Loggly.

Some of our sources include:

LG ProactiveMonitoringSaaS BlogImage C
Loggly automatically parses several log formats you can navigate with the Fields Explorer.

 

Every stage in the software development life cycle can be enriched by logs and other metrics. Implementation, integration, staging, and production deployment (especially rolling deploys) will particularly benefit from monitoring such metrics appropriately.

 

Logs constitute valuable data for your tech team, and invaluable data for your business. Now that you have rich information about the app that is generated in real-time, think about ways to put it in good use.

 

4. Automate Your Monitoring Configuration

Modern applications are deployed using infrastructure as code (IaC) techniques because they replace fragile server configuration with systems that can be easily torn down and restarted. If your team has made undocumented changes to servers and are too scared to shut them down, they are essentially “pet” servers.

 

If you manually deploy monitoring configuration on a per-server basis, then you have the potential to lose visibility when servers stop or when you add new ones. If you treat monitoring as something to be automatically deployed and configured, then you’ll get better coverage for less effort in the long run. This becomes even more important when testing new versions of your infrastructure or code, and when recovering from outages. Tools like Terraform, Ansible, Puppet, and CloudFormation can automate not just the deployment of your application but the monitoring of it as well.

Monitoring tools typically have system agents that can be installed on your infrastructure to begin streaming metrics into their service. In the case of applications built on SaaS platforms, there are convenient integrations that plug into well-known ecosystems. For example, Loggly streams and centralizes logs as metrics, and supports dozens of out-of-box systems, including the Amazon Cloudwatch and Heroku PaaS platforms.

 

5. Use Alerts on Your Key Metrics

Monitoring solutions like Loggly can alert you in changes in your SLIs over time, such as your error rate. It can help you visually identify the types of errors that occur and when they start. This will help identify root causes and fix problems faster, minimizing impact to user experience.

LG ProactiveMonitoringSaaS BlogImage D
Loggly Chart of application errors split by errorCode.

 

Custom alerts can be created from saved log searches, which act as key metrics of your application’s performance. Loggly even lets you integrate alerts to incident management systems like PagerDuty and OpsGenie.

LG ProactiveMonitoringSaaS BlogImage E
Adding an alert from a Syslog error log search in Loggly.

 

In conclusion, monitoring your SaaS service performance is very important because it significantly impacts your business’ bottom line. This monitoring has to be planned for, applied early on, and instrumented for all the stages in the SDLC.

 

Additionally, we explained how and why correct logging is one of the best sources for key metrics to measure your monitoring goals during development and production of your SaaS service. Proper logging on an easy-to-use platform such as Loggly will also help your business harness invaluable intel in real time. You can leverage these streams of information for tuning your app, improving your service, and to discover new revenue models.

 

Sign up for a free 14-day trial of SolarWinds Loggly to start doing logging right today, and move your SaaS business into the next level of performance control and business intelligence.

Let’s dream for a while—imagine your databases. All of them are running fast and smooth. There’s no critical issue, no warnings. All requests are handled immediately and the response time is literally immeasurable. Sounds like a database nirvana, doesn’t it? Now, let’s face reality. You resolved all critical issues of the database, but people still report slowdowns. Everything looks good at a first glance, but your sixth sense tells you something bad is happening under the surface. You could start shooting in the dark and hope that you will hit the target, or you need more information about what’s going inside the database to make a single, surgically precise cut to solve the problem.

 

We’ve got good news for you. SolarWinds has a new tool called SQL Plan Warnings. For the first time, you can inspect the list of queries that have warnings without spending hours on manual and labor-intensive work. Oh, and we almost forgot to mention—this tool is available for you right now for free.

Why do we believe that the free SQL Plan Warnings tool can help you improve your databases? Well, SQL Server Optimizer often comes up with bad plans with warnings. That can cause increased resource consumption, increased wait time, and unnecessary end-user or customer angst. For these reasons, a database professional should look at it. But we don’t always have time or resources to do so.

 

SQL Plan Warnings free tool at a glance:

  • Gives you unique visibility into plan warnings that can be easily overlooked and can affect query performance
  • Sort all warnings by consumed CPU time, elapsed time, or execution
  • Filter results by warning type or by specific keywords
  • Investigate plan warnings, query text, or complete query plan in a single click
  • No installation is needed—just download the tool and run
  • Runson Microsoft Windows and MacOS X

 

And what can SQL Plan Warnings check for you?

  • Spill to TempDB – Sorts that are larger than estimated can spill to disk via TempDB. This can dramatically slow down queries. There are two similar warnings that fall into this category.
  • No join predicates – Query does not properly join tables/objects, which can cause Cartesian products and slow queries.
  • Implicit conversion – A column of data is being converted to another data type, which can cause a query to not use an index.
  • Missing indexes – SQL Server is telling us there is an index that may help performance.
  • Missing column statistics – If statistics are missing, it can lead to bad decisions by the optimizer.
  • Lookup warning – An index is being used, but it's not covering an index, and a visit back to the table is required to complete the query.

 

The free SQL Plan Warnings tool brings a fresh new feature to your database management capabilities and gives you another tool to improve query performance. Download it here today and be another step closer to our dream—everything in a database running fast and smooth with no critical issues and no warnings.

When development started on NGINX in 2002, the goal was to develop a web server which would be more performant than Apache had been up to that point. While NGINX may not offer all of the features available in Apache, its default configuration can handle approximately four times the number of requests per second while using significantly less memory.

 

While switching to a web server with better performance seems like a no-brainer, it’s important that you have a monitoring solution in place to ensure that your web server is performing optimally, and that users who are visiting the NGINX-hosted site receive the best possible experience. But how do we ensure that the experience is as performant as expected for all users?

 

Monitoring!

 

This article is meant to assist you in putting together a monitoring plan for your NGINX deployments. We’ll look at what metrics you should be monitoring, why they are important, and putting a monitoring plan in place using SolarWinds® AppOptics™.

 

Monitoring is a Priority

 

As engineers, we all understand and appreciate the value that monitoring provides. In the age of DevOps, however, when engineers are responsible for both the engineering and deployment of solutions into a production environment, monitoring is often relegated to the list of things we plan to do in the future. In order to be the best engineers we can be, monitoring should be the priority from day one.

 

Accurate and effective monitoring allows us to test the efficiency of our solutions, and help identify and troubleshoot inefficiencies and other potential problems. Once the solution has moved to requiring operational support, monitoring allows us to ensure that the application is running efficiently and alerting us when things go wrong. An effective monitoring plan should help to identify problems before they start, allowing engineers to resolve issues proactively, instead of being purely reactive.

 

Specific Metrics to Consider with NGINX

 

Before we can develop a monitoring plan, we need to know what metrics are available for monitoring, understand what they mean, and how we can use them. There are two distinct groups of metrics we should be concerned with—metrics related to the web server itself, and those related to the underlying infrastructure.

 

While a highly performant web server like NGINX may be able to handle more requests and traffic, it is vital that the machine hosting the web server has the necessary resources as well. Each metric represents a potential limit to the performance of your application. Ultimately, you want to ensure your web server and underlying infrastructure are able to operate efficiently without approaching those limits.

 

NGINX Web Server-specific Metrics

 

  • Current Connections
    Indicates the number of active and waiting client connections with the server. This may include actual users and automated tasks or bots.
  • Current Requests
    Each connection may be making one or more requests to the server. This number indicates the total count of requests coming in.
  • Connections Processed
    This shows the number of connections that have been accepted and handled by the server. Dropped connections can also be monitored.

 

Infrastructure-specific Metrics

  • CPU Usage
    An indication of the processing usage of the underlying machine. This should be measured as utilization across all cores, if using a multi-core machine.
  • Memory Usage
    Measurement of the memory currently in use on the machine.
  • Swap Usage
    Swap is what the host machine uses when it runs out of memory or if the memory region has been unused for a period of time. It is significantly slower, and is generally only used in an emergency. When an application begins using swap space, it’s usually an indicator that something is amiss.
  • Network Bandwidth
    Similar to traffic, this is a measurement of information flowing in and out of the machine. Again, load units are important to monitor here as well.
  • Disk Usage
    Even if the web server is not physically storing files on the host machine, space is required for logging, temporary files, and other supporting files.
  • Load
    Load is a performance metric which combines many of the other metrics into a simple number. A common rule of thumb is the load on the machine should be less than the number of processing cores.

 

Let’s look at how to configure monitoring on your instances with AppOptics, along with building a dashboard which will show each of those metrics.

 

Installing the AppOptics Agent on the Server

 

Before you start, you’ll need an account with AppOptics. If you don’t already have one, you can create a demo account, which will give you 14 days to try the service, free of charge.

 

The first thing to do to allow AppOptics to aggregate the metrics from the server is install the agent on all instances. To do this, you’ll need to reference your AppOptics API token when setting up the agent. Log in to your AppOptics account and navigate to the Infrastructure page.

 

Locate the Add Host button, and click on it. It should look similar to the image below.

 

Fig. 2. AppOptics Host Agent Installation

 

I used the Easy Install option when setting up the instances for this article. Ensure that Easy Install is selected, and select your Linux distribution. I used an Ubuntu image in the AWS Cloud, but this will work on almost any Linux server.

 

Note: Prior to installation of the agent, the bottom of the dialog below will not contain the success message.

 

Copy the command from the first box, and then SSH into the server and run the Easy Install script.

 

Fig. 3. Easy Install Script to Add AppOptics Agent to a Server

 

When the agent installs successfully, you should be presented with the following message on your terminal. The “Confirm successful installation” box on the AppOptics agent screen should look similar to the above, with a white on blue checkbox. You should also see “Agent connected.”

 

Fig. 4. Installing the AppOptics Agent on your NGINX Instance

 

Configuring the AppOptics Agent

 

With the agent installed, the next step is to configure NGINX to report metrics to the agent. Navigate back to the Infrastructure page, Integrations tab, and locate the NGINX plugin.

 

Note: Prior to enabling the integration, the “enabled” checkbox won’t be marked.

 

Fig. 5. NGINX Host Agent Plugin

 

Click on the plugin, and the following panel will appear. Follow the instructions in the panel, click Enable Plugin, and your metrics will start flowing from the server into AppOptics.

 

Fig. 6. NGINX Plugin Setup

 

When everything is configured, either click on the NGINX link in the panel’s Dashboard tab, or navigate to the Dashboards page directly, then select the NGINX link to view the default dashboard provided by AppOptics.

 

Working With the NGINX Dashboard

 

The default NGINX dashboard provided by AppOptics offers many metrics related to the performance of the web server that we discussed earlier and should look similar to the image below.

 

Fig. 8. Default AppOptics Dashboard

 

Now we need to add some additional metrics to get a full picture of the performance of our server. Unfortunately, you can’t make changes to the default dashboard, but it’s easy to create a copy and add metrics of your own. Start by clicking the Copy Dashboard button at the top of the screen to create a copy.

 

Create a name for your custom dashboard. For this example, I’m monitoring an application called Retwis, so I’m calling mine “NGNIX-Retwis.” It’s also helpful to select the “Open dashboard on completion” option, so you don’t have to go looking for the dashboard after it’s created.

 

Let’s do some customization. First, we want to ensure that we’re only monitoring the instances we need to. We do this by filtering the chart or dashboard. You can find out more about how to set and filter these in the documentation for Dynamic Tags.

 

With our sources filtered, we can add some additional metrics. Let’s look at CPU Usage, Memory Usage, and Load. Click on the Plus button located  at the bottom right of the dashboard. For CPU and Memory Usage, let’s add a Stacked chart. We’ll add one for each. Click on the Stacked icon.

 

Fig. 10. Create New Chart

 

In the Metrics search box, type “CPU” and hit enter. A selection of available metrics will appear below. I’m going to select system.cpu.utilization, but your selection may be different depending on the infrastructure you’re using. Select the checkbox next to the appropriate metric, then click Add Metrics to Chart. You can add multiple metrics to the chart by repeating the same process, but we’ll stick with one for now.

 

If you click on Chart Attributes, you can change the scale of the chart, adjust the Y-axis label, and even link it to another dashboard to show more detail for a specific metric. When you’re done, click on the green Save button, and you’ll be returned to your dashboard, with the new chart added. Repeat this for Memory Usage. I chose the “system.mem.used” metric.

 

For load, I’m going to use a Big Number Chart Type, and select the system.load.1_rel metric. When you’re done, your chart should look similar to what is shown below.

 

Fig. 11. Custom Dashboard to View NGINX Metrics

 

Pro tip: You can move charts around by hovering over a chart, clicking on the three dots that appear at the top of the chart, and dragging it around. Clicking on the menu icon on the top right of the chart will allow you to edit, delete, and choose other options related to the chart.

 

Beyond Monitoring

 

Once you have a monitoring plan in place and functioning, the next step is to determine baseline metrics for your application and set up alerts which will be triggered when significant deviations occur. Traffic is a useful baseline to determine and monitor. A significant reduction in traffic may indicate a problem that is preventing clients from accessing the service. A significant increase in traffic would indicate an increase in clients, and may require either an increase in the capacity of your environment (in the case of increased popularity), or, potentially, the deployment of defensive measures in response to a cyberattack.

 

Monitoring your NGINX server is critical as a customer-facing part of your infrastructure. You need to know immediately when there is a sudden change in traffic or connections that could impact the rest of your application or website. AppOptics provides an easy way to monitor your NGINX servers and it typically only takes a few minutes to get started. Learn more about AppOptics infrastructure monitoring and try it today with a free 14-day trial.

Kubernetes is a container orchestrator that provides a robust, dynamic environment for reliable applications. Maintaining a Kubernetes cluster requires proactive maintenance and monitoring to help prevent and diagnose issues that occur in clusters. While you can expect a typical Kubernetes cluster to be stable most of the time, like all software, issues can occur in production. Fortunately, Kubernetes insulates us against most of these issues with its ability to reschedule workloads, and just replacing nodes when issues occur. When cloud providers have availability zone outages, or are in constrained environments such as bare metal, being able to debug and successfully resolve problems in our nodes is still an important skill to have.

In this article, we will use SolarWinds® AppOptics tracing to diagnose some latency issues with applications running on Kubernetes. AppOptics is a next-generation application performance monitoring (APM) and infrastructure monitoring solution. We’ll use it’s trace latency on requests to our Kubernetes pods to identify problems in the network stack.

The Kubernetes Networking Stack

Networking in Kubernetes has several components and can be complex for beginners. To be successful in debugging Kubernetes clusters, we need to understand all of the parts.

 

Pods are the scheduling primitives in Kubernetes. Each pod is composed of multiple containers that can optionally expose ports. However, because pods may share the same host on the same ports, workloads must be scheduled in a way that ensures ports do not conflict with each other on a single machine. To solve this problem, Kubernetes uses a network overlay. In this model, pods get their own virtual IP addresses to allow different pods to listen to the same port on the same machine.

 

This diagram shows the relationship between pods and network overlays. Here we have two nodes, each running two pods, all connected to each other via a network overlay. The overlay assigned each of these pods an IP and can listen on the same port despite conflicts they (is the “they” referring to the pods or the overlay? If it’s the pods please replace “they” with “pods” and if it’s the overlay, “they” should be changed to “it” would have listening at the host level. Network traffic, shown by the arrow connecting pods B and C, is facilitated by the network overlay and pods do not have knowledge about the host’s networking stack.

 

Having pods on a virtualized network solves significant issues with providing dynamically scheduled networked workloads. However, these virtual IPs are randomly assigned. This presents a problem for any service or DNS record relying on these pod IPs. Services fixes this by providing a stable virtual IP frontend to these pods. These services maintain a list of backend pods and load balances across them. The kube-proxy component routes requests for these service IPs from anywhere in the cluster.

 

 

This diagram differs slightly from the last one. Although pods may still be running on node 1, we omitted them from this diagram for clarity. We defined a service A that is exposed on port 80 on our hosts. When a request is made, it is accepted by the kube-proxy component and forwarded onto pod A1 or A2, which then handles the request. Although the service is exposed to the host, it is also given its own service IP on a separate CIDR from the pod network and can be accessed from within the cluster as well on that IP.

 

The network overlay in Kubernetes is a pluggable component. Any provider that implements the Container Networking Interface APIs can be used as a network overlay, and these overlay providers can be chosen based on the features and performance required. In most environments, you will see overlay networks ranging from the cloud provider’s (such as Google Kubernetes Engine or Amazon Elastic Kubernetes) to operator-managed solutions such as flannel or Calico. Calico is a network policy engine that happens to include a network overlay. Alternatively, you can disable the built-in network overlay and use it to implement network policy on other overlays such as a cloud provider’s or flannel. This is used to enforce pod and service isolation, a requirement of most secure environments.

Troubleshooting Application Latency Issues

Now that we have a basic understanding of how networking works in Kubernetes, let’s look at an example scenario. We’ll focus on an example where a networking latency issue led to a network blockage. We’ll show you how to identify the cause of the problem and fix it.

 

To demonstrate this example, we’ll start by setting up a simple two-tier application representing a typical microservice stack. This gives us network traffic inside a Kubernetes cluster, so we can introduce issues with it that we can later debug and fix. It is made up of a web component and an API component that do not have any known bugs and correctly serve traffic.

 

These applications are written in the Go Programming Language and are using the AppOptics agent for Go. If you’re not familiar with Go, the “main” function is the entry point of our application and is at the bottom of our web tier’s file. It listens on the base path (“/”) and calls out to our API tier using the URL defined on line 13. The response from our API tier is written to an HTML template and displayed to the user. For brevity’s sake, error handling, middleware, and other good Go development practices are omitted from this snippet.

 

package main
import (
          "context"
         
"html/template"
         
"io/ioutil"
          "log"
         
"net/http"

          "github.com/appoptics/appoptics-apm-go/v1/ao"

)

const
url = "http://apitier.default.svc.cluster.local"

func
handler(w http.ResponseWriter, r *http.Request)
{
      const tpl = `
<html>
  <head>
  <meta charset="UTF-8">
    <title>My Application</title>
  </head>
  <body>
    <h1>{{.Body}}</h1>
  </body>
</html>  `

      t, w, r := ao.TraceFromHTTPRequestResponse("webtier", w, r)
      defer t.End()
      ctx := ao.NewContext(context.Background(), t)

      httpClient := &http.Client{}
      httpReq, _ := http.NewRequest("GET", url, nil)

      l := ao.BeginHTTPClientSpan(ctx, httpReq)
      resp, err := httpClient.Do(httpReq)
      defer resp.Body.Close()
      l.AddHTTPResponse(resp, err)
      l.End()

      body, _ := ioutil.ReadAll(resp.Body)
      template, _ := template.New("homepage").Parse(tpl)

      data := struct {
              Body string
     
}{
      Body: string(body),
      }

      template.Execute(w, data)
}

func
main()
{
      http.HandleFunc("/", ao.HTTPHandler(handler))
      http.ListenAndServe(":8800", nil)
}

Our API tier code is simple. Much like the web tier, it serves requests from the base path (“/”), but only returns a string of text. As part of this code, we propagate the context of any traces requested to this application with the name “apitier”. This sets our application up for end to end distributed tracing.

package main

import (
      "context"
     
"fmt"
     
"net/http"
     
"time"

     
"github.com/appoptics/appoptics-apm-go/v1/ao"
)

func query() {
      time.Sleep(2 * time.Millisecond)
}

func handler(w http.ResponseWriter, r *http.Request) {
      t, w, r := ao.TraceFromHTTPRequestResponse("apitier", w, r)
      defer t.End()

      ctx := ao.NewContext(context.Background(), t)
      parentSpan, _ := ao.BeginSpan(ctx, "api-handler")
      defer parentSpan.End()

      span := parentSpan.BeginSpan("fast-query")
      query()
      span.End()

      fmt.Fprintf(w, "Hello, from the API tier!")
}

func main() {
      http.HandleFunc("/", ao.HTTPHandler(handler))
      http.ListenAndServe(":8801", nil)
}

When deployed on Kubernetes and accessed from the command line, these services look like this:

Copyright: Kubernetes®

This application is being served a steady stream of traffic. Because the AppOptics APM agent is turned on and tracing is being used, we can see a breakdown of these requests and the time spent in each component, including distributed services. From the web tier component’s APM page, we can see the following graph:

This view is telling us the majority of our time is spent in our API tier, with a brief amount of time spent in the web tier serving this traffic. However, we have an extra “remote calls” section. This section represents untraced time between the API tier and web tier. For a Kubernetes cluster, this includes our kube-proxy, network overlay, or proxies that have not had tracing added to them. This makes up 1.65ms of our request for a normal request, which for this environment adds an insignificant overhead, so we can use this as our “healthy” benchmark for this cluster.

Now we will simulate a failure in the networking overlay layer. Using a tool satirically named Comcast, we can simulate adverse network conditions. This tool uses iptables and the traffic control (tc) utility, standard Linux utilities for managing network environments, under the hood. Our test cluster is using Calico as the network overlay and exposes a tunl0 interface. This is a custom, local tunnel Calico uses to bridge all network traffic to both implement the network overlay between machines and enforce policy. We only want to simulate a failure at the network overlay, so we use it as the device, and inject 500ms of latency with a maximum bandwidth of 50kbps and minor packet loss.

Our continuous traffic testing is still running. After a few minutes of new requests, our AppOptics APM graph looks very different:

While our application time and tracing-api-tier remained consistent, our remote calls time jumped significantly. We’re now spending 6-20 seconds of our request time just traversing the network stack. Thanks to tracing, it’s clear that this application is operating as expected and the problem is in another part of our stack. We also have the AppOptics Agent for Kubernetes and Integration of CloudWatch running on this cluster, so we can look at the host metrics to find more symptoms of the problem:

Our network graph suddenly starts reporting much more traffic, and then stops reporting entirely. This could be a symptom of our network stack handling a great deal of requests into our host on the standard interface (eth0), queueing at the Calico tunnel, and then overflowing and preventing any more network traffic from accessing the machine until existing requests time out. This aggregate view of all traffic moving inside of our host is deceptive since it’s counting every byte passing through internal as well as external interfaces, which explains our extra traffic.

 

We still have the problem where the agent stops reporting. Because the default pods use the network overlay, the agent reporting back to AppOptics suffers from the same problem our API tier is having. As part of recovering this application and helping prevent this issue from happening again, we would move the AppOptics agent off of the network overlay and use the host network.

 

Even with our host agent either delayed or not reporting at all, we still have the AppOptics CloudWatch metrics for this host turned on, and can get the AWS view of the networking stack on this machine:

 

In this graph we see that at the start of the event traffic becomes choppy, but is generally fixed between 50Kb/s out on normal operation all the way up to 250Kb/s. This could be our bandwidth limits and packet loss settings causing bursts of traffic out. In any case, there’s a massive discrepancy between the networking inside of our Kubernetes cluster and outside of it, which points us to problems with our overlay stack. From here, we would move the node out of service, let Kubernetes automatically schedule our workloads onto other hosts, and proceed with host-level network debugging, like looking at our iptables settings, checking flow logs, and the health of our overlay components.

 

Once we remove these rules to clear the network issue, and our traffic quickly returns to normal.

 

The latency drops to such a small value, and it’s no longer visible on the graph after 8:05:

Next Steps

Hopefully now you are much more familiar with how the networking stack works on Kubernetes and how to identify problems. A monitoring solution like AppOptics APM can help you monitor the availability of service and troubleshoot problems faster. A small amount of tracing in your application goes a long way in identifying components of your systems that are having latency issues.

Version 1.1 of the venerable HTTP protocol powered the web for 18 years.

 

Since then, websites have emerged from static, text-driven documents to interactive, media-rich applications. The fact that the underlying protocol remained unchanged throughout this time just goes to show how versatile and capable it is. But as the web grew bigger, its limitations became more obvious.

 

We needed a replacement, and we needed it soon.

 

Enter HTTP/2. Published in early 2015, HTTP/2 optimizes website connections without changing the existing application semantics. This means you can take advantage of HTTP/2’s features such as improved performance, updated error handling, reduced latency, and lower overhead without changing your web applications.

 

Today nearly 84% of modern browsers and 27% of all websites support HTTP/2, and those numbers are gradually increasing.

 

How is HTTP/2 Different from HTTP/1.1?

HTTP/2’s biggest changes impact the way data is formatted and transported between clients and servers.

 

Binary Data Format

HTTP/2 encapsulates data using a binary protocol. With HTTP/1.1, messages are transmitted in plaintext. This makes requests and responses easy to format and even read using a packet analysis tool, but results in increased size due to unnecessary whitespace and inefficient compression.

 

The benefit of a binary protocol is it allows for more compact, more easily compressible, and less error-prone transmissions.

 

Persistent TCP Connections

In early versions of HTTP, a new TCP connection had to be created for each request and response. HTTP/1.1 introduced persistent connections, allowing multiple requests and responses over a single connection. The problem was that messages were exchanged sequentially, with web servers refusing to accept new requests until previous requests were fulfilled.

 

HTTP/2 simplifies this by allowing for multiple simultaneous downloads over a single TCP connection. After a connection is established, clients can send new requests while receiving responses to earlier requests. Not only does this reduce the latency in establishing new connections, but servers no longer need to maintain multiple connections to the same clients.

 

Multiplexing

Persistent TCP connections paved the way for multiplexed transfers. With HTTP/2, multiple resources can be transferred simultaneously. Clients no longer need to wait for earlier resources to finish downloading before the next one begins. Website developers used workarounds such as domain sharding to “trick” browsers into opening multiple connections to a single host; however, this led to browsers opening multiple TCP connections. HTTP/2 makes this entire practice obsolete.

 

Header Compression and Reuse

In HTTP/1.1, headers are incompressible and repeated for each request. As the number of requests grows, so does the volume of duplicate header information. HTTP/2 eliminates redundant headers and compresses the remaining headers to drastically decrease the amount of data repeated during a session.

 

Server Push

Instead of waiting for clients to request resources, servers can now push resources. This allows websites to preemptively send content to users, minimizing wait times.

 

Does My Site Already Support HTTP/2?

Several major web servers and content delivery networks (CDNs) support HTTP/2. The fastest way to check if your website supports HTTP/2 is to navigate to the website in your browser and open Developer Tools. In Firefox and Chrome, press Ctrl-Shift-I or the F12 key and click the Network tab. Reload the page to populate the table with a list of responses. Right-click the column names in the table and enable the “Protocol” header. This column will show HTTP/2.0 in Firefox or h2 in Chrome if HTTP/2 is supported, or HTTP/1.1 if it’s not.

 

What is HTTP/2, and Will It Really Make Your Site Faster?
The network tab after loading 8bitbuddhism.com©. The website fully supports HTTP/2 as shown in the Protocol column.

 

Alternatively, KeyCDN provides a web-based HTTP/2 test tool. Enter the URL of the website you want to test, and the tool will report back on whether it supports HTTP/2.

 

How Do I Enable HTTP/2 on Nginx?

As of version 1.9.5, Nginx fully supports HTTP/2 via the ngx_http_v2 module. This module comes included in the pre-built packages for Linux and Windows. When building Nginx from source, you will need to enable this module by adding –with-http_v2_module as a configuration parameter.

You can enable HTTP/2 for individual server blocks. To do so, add http2 to the listen directive. For example, a simple Nginx configuration would look like this:

# nginx.conf
server {
listen 443 ssl http2;
server_name mywebsite.com;

root /var/www/html/mywebsite;
}

Although HTTP/2 was originally intended to require SSL, you can use it without SSL enabled. To apply the changes, reload the Nginx service using:

$ sudo service nginx reload

or by invoking the Nginx CLI using:

$ sudo /usr/sbin/nginx -s reload

Benchmarking HTTP/2

To measure the speed difference between HTTP/2 and HTTP/1.1, we ran a performance test on a WordPress site with and without HTTP/2 enabled. The site was hosted on a Google Compute Engine instance with 1 virtual CPU and 1.7 GB of memory. We installed WordPress 4.9.6 in Ubuntu 16.04.4 using PHP 7.0.30, MySQL 5.7.22, and Nginx 1.10.3.

 

To perform the test, we created a recurring page speed check in SolarWinds®Pingdom® to contact the site every 30 minutes. After four measurements, we restarted the Nginx server with HTTP/2 enabled and repeated the process. We then dropped the first measurement for each test (to allow Nginx to warm up), averaged the results, and took a screenshot of the final test’s Timeline.

 

 

The metrics we measured were:
  • Page size: the total combined size of all downloaded resources.
  • Load time: the time until the page finished loading completely.

 

Results Using HTTP/1.1

 

What is HTTP/2, and Will It Really Make Your Site Faster?
Timeline using HTTP/1.1

 

Results Using HTTP/2

 

What is HTTP/2, and Will It Really Make Your Site Faster?
Timeline using HTTP/2

 

And the Winner Is…

With just a simple change to the server configuration, the website performs noticeably better over HTTP/2 than HTTP/1.1. The page load time dropped by over 13% thanks to fewer TCP connections, resulting in a lower time to first byte. As a result of only using two TCP connections instead of four, we also reduced the time spent performing TLS handshakes. There was also a minor drop in overall file size due to HTTP/2’s more efficient binary data format.

 

Conclusion

HTTP/2 is already proving to be a worthy successor to HTTP/1.1. A large number of projects have implemented it and, with the exception of Opera Mini and UC for Android, mainstream browsers already support it. Whether it can handle the next 18 years of web evolution has yet to be seen, but for now, it’s given the web a much-needed performance boost.

 

You can try this same test on your own website using the Pingdom page speed check. Running the page speed check will show you the size and load time of every element. With this data you can tune and optimize your website, and track changes over time.

DevOps engineers wishing to troubleshoot Kubernetes applications can turn to log messages to pinpoint the cause of errors and their impact on the rest of the cluster. When troubleshooting a running application, engineers need real-time access to logs generated across multiple components.

 

Collecting live streaming log data lets engineers:

  • Review container and pod activity
  • Monitor the result of actions, such as creating or modifying a deployment
  • Understand the interactions between containers, pods, and Kubernetes
  • Monitor ingress resources and requests
  • Troubleshoot errors and watch for new or recurring problems

 

The challenge that engineers face is accessing comprehensive, live streams of Kubernetes log data. While some solutions exist today, these are limited in their ability to live tail logs or tail multiple logs. In this article, we’ll present an all-in-one solution for live tailing your Kubernetes logs, no matter the size or complexity of your cluster.

 

The Limitations of Current Logging Solutions

When interacting with Kubernetes logs, engineers frequently use two solutions: the Kubernetes command line interface (CLI), or the Elastic Stack.

 

The Kubernetes CLI (kubectl) is an interactive tool for managing Kubernetes clusters. The default logging tool is the command (kubectl logs) for retrieving logs from a specific pod or container. Running this command with the --follow flag streams logs from the specified resource, allowing you to live tail its logs from your terminal.

 

For example, let’s deploy a Nginx pod under the deployment name papertrail-demo. Using kubectl logs --follow [Pod name], we can view logs from the pod in real time:

$ kubectl logs --follow papertrail-demo-76bf4969df-9gs5w 10.1.1.1 - - [04/Jan/2019:22:42:11 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0" "-"

The main limitation of kubectl logs is that it only supports individual Pods. If we deployed two Nginx pod replicas instead of one, we would need to tail each pod separately. For large deployments, this could involve dozens or hundreds of separate kubectl logs instances.

 

The Elastic Stack (previously the ELK Stack) is a popular open-source log management solution. Although it can ingest and display log data using a web-based user interface, unfortunately, it doesn’t offer support for live tailing logs.

 

What is Papertrail, and How Does It Help?

SolarWinds® Papertrail is a cloud-hosted log management solution that lets you live tail your logs from a central location. Using Papertrail, you can view real-time log events from your entire Kubernetes cluster in a single browser window.

 

When a log event is sent from Kubernetes to Papertrail, Papertrail records the log’s contents along with its timestamp and origin pod. You can view these logs in a continuous stream in your browser using the Papertrail Event Viewer, as well as the Papertrail CLI client or Papertrail HTTP API. Papertrail shows all logs by default, but you can limit these to a specific pod, node, or deployment using a flexible search syntax.

 

For example, let’s increase the number of replicas in our Nginx deployment to three. If we used kubectl logs -f, we would need to run it three times: one for each pod. With Papertrail, we can open the Papertrail Event Viewer and create a search that filters the stream to logs originating from the papertrail-demo deployment. Not only does this show us output from each pod in the deployment, but also Kubernetes cluster activity related to each pod:


Filtering a live stream of Kubernetes logs using Papertrail.

 

Sending Logs from Kubernetes to Papertrail

The most effective way to send logs from Kubernetes to Papertrail is via a DaemonSet. DaemonSets run a single instance of a pod on each node in the cluster. The pod used in the DaemonSet automatically collects and forwards log events from other pods, Kubernetes, and the node itself to Papertrail.

 

Papertrail provides two DaemonSets:

  • The Fluentd DaemonSet uses Fluentd to collect logs from containers, pods, Kubernetes, and nodes. This is the preferred method for logging a cluster.
  • The Logspout DaemonSet uses logspout to monitor the Docker log stream. This option is limited to log output from containers, not Kubernetes or nodes.

We’ll demonstrate using the Fluentd DaemonSet. From a computer with kubectl installed, download fluentd-daemonset-papertrail.yaml and open it in a text editor. Change the values of FLUENT_PAPERTRAIL_HOST and FLUENT_PAPERTRAIL_PORT to match your Papertrail log destination. Optionally, you can name your instance by changing FLUENT_HOSTNAME. You can also change the Kubernetes namespace that the DaemonSet runs in by changing the namespace parameter. When you are done, deploy the DaemonSet by running:

$ kubectl create -f fluentd-daemonset-papertrail.yaml

In a few moments, logs will start to appear in Papertrail:


Live feed of Kubernetes logs in Papertrail.

 

Best Practices for Live Tailing Kubernetes Logs

To get the most out of your logs, make sure you’re following these best practices.

 

Log All Applications to STDOUT and STDERR

Kubernetes collects logs from Pods by monitoring their STDOUT and STDERR streams. If your application logs to another location, such as a file or remote service, Kubernetes won’t be able to detect it, and neither will your Papertrail DaemonSet. When deploying an application, make sure to route its logs to the standard output stream.

 

Use the Fluentd DaemonSet

The Logspout DaemonSet is limited to logging containers. The Fluentd DaemonSet, however, will log your containers, pods, and nodes. In addition to logging more resources, Fluentd also logs valuable information such as Pod names, Pod controller activity, and Pod scheduling activity.

 

Open Papertrail Next to Your Terminal

When you’re working on Kubernetes apps and want to debug problems with Pods, have a browser window with Papertrail open either beside or behind your terminal window. This way you can see the results of your actions after you execute them. This also saves you from having to tail manually in your terminal.

 

Group Logs to Make Them Easier to Find

Kubernetes pods (and containers in general) are ephemeral and often have randomly generated names. Unless you specify fixed names, it can be hard to keep track of which pods or containers to filter on. A solution is to use log groups, which let you group logs from a specific application or development team together. This helps you find the logs you need and hide everything else.

 

Save Searches in Papertrail

Papertrail lets you save your searches for creating custom Event Viewer sessions and alerts. You can reopen previously created live tail sessions, share your sessions with team members, or receive an instant notification when new log events arrive in the stream.

 

Conclusion

Kubernetes logs help DevOps teams identify deployment problems and improve the reliability of their application . Live tailing enables faster troubleshooting by helping developers collect, view, and analyze these logs in real time. To get started in SolarWinds Papertrail, sign up and start logging your Kubernetes cluster in a matter of minutes.

Jenkins X (JX) is an exciting new Continuous Integration and Continuous Deployment (CI/CD) tool for Kubernetes users. It hides the complexities of operating Kubernetes by giving developers a simpler experience to build and deploy their code. You can think of it as creating a serverless-like environment in Kubernetes. As a developer, you don’t need to worry about all the details of setting up environments, creating a CI/CD pipeline, or connecting GitHub to your CI pipeline. All of this and much more is handled by JX. In this article, we’ll introduce you to JX, show you how to use it, and how to monitor your builds and production deployments.

 

What is Jenkins X?

JX was created by James Strachan (creator of Groovy, Apache Camel, and now JX) and was first announced in March 2018. It’s designed from the ground up to be a cloud-native, Kubernetes-only application that not only supports CI/CD, but also makes working with Kubernetes as simple as possible. With one command you can create a Kubernetes cluster, install all the tools you’ll need to manage your application, create build and deployment pipelines, and deploy your application to various environments.

Jenkins is described as an “extensible automation server” that can be configured, via plugins, to be a Continuous Integration Server, a Continuous Deployment hub, or a tool to automate just about any software task. JX provides a specific configuration of Jenkins, meaning you don’t need to know which plugins are required to stand up a CI/CD pipeline. It also deploys numerous applications to Kubernetes to support building your docker container, storing the container in a docker registry, and deploying it to Kubernetes.

Jenkins pipeline builds are driven by adding a Jenkinsfile to your project. JX automates this for you. JX can create new projects (and the required Jenkinsfile) for you or import your existing project and create a Jenkinsfile if you don’t already have one. In short, you don’t need to know anything about Jenkins or Kubernetes to get started with JX. JX will do it all for you.

 

Overview of How JX Works

JX is designed to take all of the guesswork or trial and error approach many teams have used to create a fully functional CI/CD pipeline in Kubernetes. To make a tailored developer experience, JX had to choose which Kubernetes technologies to use. In many ways, JX is like a Linux distribution, but for Kubernetes. JX had to decide, from the plethora of tools available, which ones to use to create a smooth and seamless developer experience in Kubernetes.

To make the transition to Kubernetes simpler, the command line tool jx can drive most of your interactions with Kubernetes. This means you don’t need to know how to use kubectl right away; instead you can slowly adopt kubectl as you become more comfortable in Kubernetes. If you are an experienced Kubernetes user, you’ll use jx for interacting with JX (CI/CD, build logs, and so on) and continue to use kubectl for other tasks.

When you create or import a project using the jx command line tool, JX will detect your project type and create the appropriate Jenkinsfile for you (if it doesn’t already exist), define the required Kubernetes resources for your project (like Helm charts), add your project to GitHub and create the necessary webhooks for your application, build your application in Jenkins, and if all tests pass, deploy your application to a staging environment. You now have a fully integrated Kubernetes application with a CI/CD pipeline ready to go.

Your interaction with JX is driven by a few jx commands to set up and env, create or import an application, and monitor the state of your build pipelines. The developer workflow is covered in the next section. Generally speaking, once set up, you don’t need to interact with JX that much; it works quietly in the background, providing you CI and CD functionality.

 

Install Jenkins X

To get started using JX, install the jx binary. For Mac OS, you can use brew:

brew tap jenkins-x/jx brew install jx

Note: When I first tried to create a cluster using JX, it installed kops for me. However, the first time jx tried to use kops, it failed because kops wasn’t on my path. To address this, install kops as well:

brew install kops

Create a Kubernetes Cluster

JX supports most major cloud environments: Google GKE, Azure AKS, Amazon EKS, minikube, and many others. JX has a great video on installing JX on GKE. Here, I’m going to show you how to install JX in Amazon without EKS. Creating a Kubernetes cluster from scratch is very easy:

jx create cluster aws

Since I wasn’t using JX for a production application, I ran into a few gotchas during my install:

  1. When prompted with, “No existing ingress controller found in the kube-system namespace, shall we install one?” say yes.
  2. Assuming you are only trying out JX, when prompted with, “Would you like to register a wildcard DNS ALIAS to point at this ELB address?” say no.
  3. When prompted with, “Would you like wait and resolve this address to an IP address and use it for the domain?” say yes.
  4. When prompted with, “If you don’t have a wildcard DNS setup then set up a new CNAME and point it at: XX.XX.XX.XX.nip.io. Then, use the DNS domain in the next input” accept the default.

The image below shows you the EC2 instances that JX created for your Kubernetes Cluster (master is an m3.medium instance and the nodes are t2.medium instances):

LG IntroJenkinsX 1
AWS EC2 Instances. © 2018 Amazon Web Services, Inc. or its affiliates. All rights reserved.

When you are ready to remove the cluster you just created, you can use this command (JX currently does not provide a delete cluster command):

kops delete cluster

Here’s the full kops command to remove the cluster you just created (you’ll want to use the cluster name and S3 bucket for all kops commands):

kops delete cluster --name aws1.cluster.k8s.local \ --state=s3://kops-state-xxxxxx-ff41cdfa-ede6-11e8-xx6-acde480xxxx

To add Loggly integration to your Kubernetes cluster, you can follow the steps outlined here.

 

Create an Application

Now that JX up and running, you are ready to create an application. The quickest way to do this is with the JX quickstart. In addition to the quickstart applications that come with JX, you can also create your own.

To get started, run create quickstart, and pick the spring-boot-http-gradle quick start (see the screenshot below for more details):

jx create quickstart

 

LG IntroJenkinsX 2
Creating a Kubernetes cluster using jx create cluster © 2018 Jenkins Project

Note: During the install process, I did run into one issue. When prompted with, “Which organization do you want to use?” make sure you choose a GitHub Org and not your personal account. The first time I ran this, I tried my personal account (which has an org associated with it) and jx create quickstart failed. When I reran it, I chose my org ripcitysoftware and everything worked as expected.

Once your application has been created, it will automatically be deployed to the staging environment for you. One thing I really like about JX is how explicit everything is. There isn’t any confusion between temporary and permanent environments because the environment name is embedded into the application URL (http://spring-boot-http-gradle.jx-staging.xx.xx.xx.xx.nip.io/).

The Spring Boot quickstart application provides you with one rest endpoint:

LG IntroJenkinsX 3
Example Spring Boot HTTP © 2018 Google, Inc

 

Developer Workflow

JX has been designed to support a trunk-based development model promoted by DevOps leaders like Jez Humble and Gene Kim. JX is heavily influenced by the book Accelerate (you can find more here), and as such it provides an opinionated developer workflow approach. Trunk-based development means releases are built off of trunk (master in git). Research has shown that teams using trunk-based development are more productive than those using long-lived feature branches. Instead of long-lived feature branches, teams create branches that live only a few hours, making a few small changes.

Here’s a short overview of trunk-based development as supported by JX. To implement a code change or fix a bug, you create a branch in your project, write tests, and make code changes as needed. (These changes should only take a couple of hours to implement, which means your code change is small.) Push your branch to GitHub and open a Pull Request. Now JX will take over. The webhook installed by JX when it imported your project will trigger a CI build in Jenkins. If the CI build succeeds, Jenkins will notify GitHub the build was successful, and you can now merge your PR into master. Once the PR is merged, Jenkins will create a released version of your application (released from the trunk branch) and deploy it (CD) to your staging environment. When you are ready to promote your application from stage to production, you’ll use the jx promote command.

The development workflow is expected to be:

  1. In git, create a branch to work in. After you’ve made your code changes, commit them and then push your branch to your remote git repository.
  2. Open a Pull Request in your remote git repo. This will trigger a build in Jenkins. If the build is successful, JX will create a preview environment for your PR so you can review and test your changes. To trigger the promotion of your code from Development to Staging, merge your PR.
  3. By default, JX will automatically promote your code to Stage. To promote your code to Production, you’ll need to run this command manually: jx promote app-name --version x.y.z --env production

Monitoring Jenkins X

Monitoring the status of your builds gives you insight into how development is progressing. It will also help you keep track of how often you are deploying apps to various environments.

JX provides you multiple ways to track the status of a build. JX configures Jenkins to trigger a build when a PR is opened or updated. The first place to look for the status of your build is in GitHub itself. Here is a build in GitHub that resulted in a failure. You can clearly see the CI step has failed:

LG IntroJenkinsX 4
GitHub PR Review Web Page. © 2018 GitHub Inc. All rights reserved.

The next way to check on the status of your build is in Jenkins itself. You can navigate to Jenkins in your browser or, from GitHub, you can click the “Details” link to the right of “This commit cannot be built.” Here is the Jenkins UI. You will notice Jenkins isn’t very subtle when a build fails:

LG IntroJenkinsX 5
Jenkins Blue Ocean failed build web page. © 2018 Jenkins Project

A third way to track the status of your build is from the command line, using the jx get activity command:

LG IntroJenkinsX 6
iTerm – output from jx get activity command © 2018 Jenkins Project

If you want to see the low-level details of what Jenkins is logging, you’ll need to look at the container Jenkins is running in. Jenkins is running in Kubernetes like any other application. It’s deployed as a pod and can be found using the kubectl command:

$ kubectl get pods NAME                      READY     STATUS    RESTARTS   AGE jenkins-fc467c5f9-dlg2p   1/1       Running   0          2d

Now that you have the name of the Pod, you can access the log directly using this command:

$ kubectl logs -f jenkins-fc467c5f9-dlg2p

 

LG IntroJenkinsX 7
iTerm – output from kubectl logs command © 2018 Jenkins Project

Finally, if you’d like to get the build output log, the log that’s shown in the Jenkins UI, you can use the command below. This is the raw build log that Jenkins creates when it’s building your application. When you have a failed build, you can use this output to determine why the build failed. You’ll find your test failures here along with other errors like failures in pushing your artifacts to a registry. The output below is not logged to the container (and therefore not accessible by Loggly):

$ jx get build log ripcitysoftware/spring-boot-http-gradle/master view the log at: http://jenkins.jx.xx.xx.xxx.xxx.nip.io/job/ripcitysoftware/job/spring-boot-http-gradle/job/master/2/console tailing the log of ripcitysoftware/spring-boot-http-gradle/master #2 Push event to branch master Connecting to https://api.github.com using macInfinity/****** (API Token for accessing https://github.com Git service inside pipelines)

Monitoring in Loggly

One of the principles of a microservice architecture, as described by Sam Newman in Building Microservices, is being Highly Observable. Specifically, Sam suggests that you aggregate all your logs. A great tool for this is SolarWinds® Loggly. Loggly is designed to aggregate all of your logs into one central location. By centralizing your logs, you get a holistic view of your systems. Deployments can trigger a change in the application that can generate errors or lead to instability. When you’re troubleshooting a production issue, one of the first things you want to know is whether something changed. Being able to track the deployments in your logs will let you backtrack deployments that may have caused bugs.

To monitor deployments, we need to know what’s logged when a deployment succeeds or fails. This is the message Jenkins logs when a build has completed:

INFO: ripcitysoftware/spring-boot-http-gradle/master #6 completed: SUCCESS

From the above message, we get a few pieces of information: the name of the branch, which contains the Project name ripcitysoftware/spring-boot-http-gradle and the branch master, the build number #6, and finally the build status SUCCESS.

The metrics you should monitor are:

  • Build status – Whether a build was a success or failure
  • The project name – Which project is being built
  • The build number – Tracks PRs and releases

By tracking the build status, you can see how often builds are succeeding or failing. The project name and build number tell you how many PRs have been opened (look for “PR” in the project name) and how often a release is created (look for “master” in the name).

To track all of the above fields, create one Derived Field in Loggly called jxRelease. Each capture group (the text inside of the parentheses) defines a unique Derived Field in Loggly. Here is the regex you’ll need:

^INFO:(.*)\/.*(master|PR.*) #(.*\d) completed: ([A-Z]+$)$

Here’s the Jenkins build success log-message above as it appears in Loggly after we’ve created the Derived Field. You can see all the fields we are defining highlighted in yellow below the Rule editor:

LG IntroJenkinsX
Loggly – Derived Field editor web page.  © 2018 SolarWinds Worldwide, LLC. All rights reserved.

Please note that Derived Fields use past logs only in the designer tool. Loggly only adds new derived fields to new log messages. This means if you’ve got an hour of Jenkins output already sent to Loggly and you create the jxBuildXXX fields (as shown above), only new log messages will include this field.

In the image below, you can see all the Derived Fields that have been parsed in the last 30 minutes. For jxBuildBranchName, there has been one build to stage, and it was successful, as indicated by the value SUCCESS. We also see that nine (9) builds have been pushed to stage, as indicated by the jxBuildNumber field.

 

LG IntroJenkinsX 9
Loggly Search Results web page.  © 2018 SolarWinds Worldwide, LLC. All rights reserved.

Now that these fields are parsed out of the logs, we can filter on them using the Field Explorer. Above, you can see that we have filtered on the master branch. This shows us each time the master branch has changed. When we are troubleshooting a production bug, we can now see the exact time the code changed. If the bug started after a deployment, then the root cause could be the code change. This helps us narrow down the root cause of the problem faster.

We can also track when master branch builds fail and fire an alert to notify our team on Slack or email. Theoretically, this should never happen, assuming we are properly testing the code. However, there could have been an integration problem that we missed, or a failure in the infrastructure. Setting an alert will notify us of these problems so we can fix them quickly.

 

Conclusion

JX is an exciting addition to Jenkins and Kubernetes alike. JX fills a gap that has existed since the rise of Kubernetes: how to assemble the correct tools within Kubernetes to get a smooth and automated CI/CD experience. In addition, JX helps break down the barrier of entry into Kubernetes and Jenkins for CI/CD. JX itself gives you multiple tools and commands to navigate system logs and track build pipelines. Adding Loggly integration with your JX environment is very straightforward. You can easily track the status of your builds and monitor your apps progression from development to a preview environment, to a staging environment and finally to production. When there is a critical production issue that you are troubleshooting, you can look at the deployment time to see if changes in the code caused the issue.

Are you an administrator who’s supporting a small environment, and haven’t yet had the time or budget to invest in a centralized IT monitoring toolNo doubt you are tired of coworkers showing up at your desk or calling about an outage you weren’t yet aware of. If an enterprise-class solution would be overkill, but you don’t have the budget to purchase a licensed solution, ipMonitor Free Edition might be able to bridge that gap. 

 

ipMonitor Free Edition is a fully functional version of our ipMonitor solution for smaller environments.  It’s a standalone, free tool that helps you stay on top of what is going on with your critical network devices, servers, and applications—so you know what’s up, what’s down, and what’s not performing as expected. 

 

ipMonitor Free Edition at a Glance

  • Clear visibility of IT network dev !ice, server, and application status
  • Customizable alerting with optional automatic remediation
  • Simple deployment with our startup wizard and alerting recommendations
  • Lightweight installation and maintenance

 

ipMonitor Free Edition is an excellent starting point to more robust, centralized monitoring. It is designed for network and systems administrators with small environments or critical components they need to focus on, and can support up to 50 monitors. Monitors watch a specific aspect of a device, service, or process. Example monitors include: Ping, CPU, memory or disk usage, bandwidth, and response time.

 

Interested in giving it a try?  Download ipMonitor Free Edition today.  If you have any questions, head over to the ipMonitor product forum and start a discussion. 

 

 

 

Are you an administrator who’s supporting a small environment, and haven’t yet had the time or budget to invest in a centralized IT monitoring tool[MJ1] ? No doubt you are tired of coworkers showing up at your desk or calling about an outage you weren’t yet aware of. If an enterprise-class solution would be overkill, but you don’t have the budget to purchase a licensed solution, ipMonitor® Free Edition [MJ2] [WK3] might be able to help you bridge the gap.


[MJ2]Link to free edition PDP

[WK3]https://www.solarwinds.com/free-tools/ipmonitor-free

Calling network engineers, network architects, and network defenders alike. We are happy to announce the arrival of the all-new SolarWinds® Flow Tool Bundle.

 

With this free tool, you can quickly distribute, test, and configure your flow traffic. Showcasing some of SolarWinds signature flow traffic analysis capabilities, the Flow Tool Bundle offers three handy, easy-to-install network traffic analysis tools: SolarWinds NetFlow Replicator, SolarWinds NetFlow Generator, and SolarWinds NetFlow Configurator.

 

So, what exactly can you do with this new addition to the vast family of SolarWinds free tools?

 

Here’s the breakdown:

 

SolarWinds NetFlow Replicator

  • Configure devices to send flow data to a single destination, then replicate the flows to a general-purpose flow analysis platform or even to a security analysis platform
  • Split off production flow streams to test new versions of the flow collector
  • Run sampled flow streams to multiple destinations or only to the destinations you designate
  • Reduce traffic through costly or low-bandwidth WAN links to decrease the volume of network management traffic
  • Enable segmentation of the managed domain to separate destination analysis platforms

 

SolarWinds NetFlow Generator

  • Troubleshoot flow tools to confirm that locally generated simulated traffic is visible in the tool
  • Validate the behavior of load balancing architectures
  • Test firewall rules that span across a network or those that are implemented on a host to confirm that flow traffic can be received
  • Perform performance and capacity lab testing
  • Perform functional testing to confirm that flow volumes are accurately represented
  • Test trigger conditions for newly created alerts and reset the alert behavior
  • Test new NetFlow application definitions
  • Populate traffic for demo environments

 

SolarWinds NetFlow Configurator

  • Analyze network performance
  • Activate NetFlow and find bandwidth hogs
  • Bypass the CLI with an intuitive GUI
  • Set up collectors for NetFlow data
  • Specify collector listening ports
  • Monitor traffic data per interface

 

How do you plan on using your Flow Tool Bundle? Install it today and let us know how you have been leveraging these awesome new free tools!

For more information about the SolarWinds Flow Tool Bundle, have a look at this page. You can also access the Quick Reference Guide on THWACK.

This time of year is always exciting. The seasons change (depending on where you live), commercial buying season ramps up, and shopping lines resemble those of an amusement park in summer. The year is coming to an end, and we are busy shopping, making holiday preparations, traveling, and coming together with family to eat, exchange gifts, and be merry.

 

I’d wager access rights management doesn’t have a top spot on your holiday list. That’s ok. The topic doesn’t exactly exude that cozy holiday feeling. On the contrary, it might make you slightly uncomfortable. 

 

Most IT environments consist of tens, hundreds, or even thousands of servers. Those servers have thousands to tens of thousands of folders, groups, and paths. How can you really know who has access to what? Is your data safe? You have, no doubt, installed security monitoring and protection solutions to help protect the data in those folders and files. You’ve done everything you can, right? Despite all those protections, you still have users with access—but you don’t know who. You don’t know what. In fact, if someone asked you who has access to what, you probably couldn’t answer. It’s a hard question to field unless you have a solution in place giving you the visibility you need. Of course, if an auditor does ask you to answer these questions, your holidays could be spent digging through folders and directories to compile information and provide answers.

 

 

SolarWinds® Access Rights Manager (ARM) helps solve these challenges and more:

 

  • ARM provides a detailed overview of your users’ access rights, allowing you to easily visualize and show where access to resources has been granted erroneously
  • ARM enables standardization and automation of access rights, so you can easily apply the appropriate rights to users through templates
  • ARM helps demonstrate compliance and prevents insider data leakage by helping you achieve the principle of least privilege and giving you full auditability of user access over time

 

Let’s dig into this further.

 

 

ARM gives a detailed overview of your users’ access rights

 

The Active Directory group concept is essential for every administrator. These groups grow organically, and after years of existence and use, they often build up to complex group nesting structures. ARM gives you back control over these group structures.

 

The ARM AD Graph visualizes group structure and depth. Structural problems with these groups become transparent through this visualization.

 

 

 

In addition to the visualization provided by the AD Graph, the ARM dashboard allows a detailed analysis of the group nesting structures and circular nested groups. This enables administrators to work on the weak spots in the AD group structure, establish a flat group structure, and meet Microsoft best practices for group management.

 

With ARM, the issues related to lack of identifiable structures—or giving permissions to too many or the wrong people/groups—belong to the past. Once the group structure has been optimized, ARM allows you to compare any recorded access rights period with your current structure, and shows changes along with documented reasoning.

 

 

ARM enables standardization and automation of access rights

 

Compliance regulations, such as FISMA, GDPR, SOX, PCI DSS, BSI, and others, require administrators to adopt a high level of responsibility to ensure data is protected. Insider data leakage can cost companies large monetary sums in addition to lost customer, vendor, and reseller trust if data gets into the wrong hands. But it’s not always the headline-making data leak issues that harm companies. Employees leaving a company and taking valuable data with them is almost guaranteed without a cohesive access rights strategy to manage, control, and audit user rights—for users throughout the whole company.

 

ARM standardizes access rights across users and gives administrators a comprehensive tool to define, manage, monitor, and audit user access to resources across Active Directory, Exchange, SharePoint, and all your file servers.

 

 

 

ARM empowers administrators to predefine certain roles within the company, efficiently grant or deny rights with one click, and display all higher-level permissions in an easy-to- monitor overview. These different roles can be assigned a data owner (e.g., for department heads) to distribute control for managing access to resources the data owner is responsible for. In addition, this establishes a mindset of distributed access rights control to help ensure users with accurate access rights knowledge are granting and/or denying access appropriately.

 

Data owners, team leads, and IT professionals can be granted access to change personal information about a user, create or delete user accounts, reset passwords, unlock user accounts, or change group memberships centrally from within ARM. This allows the duties and tasks around access rights management to be shared while following standards to ensure full auditability.

 

ARM helps demonstrate compliance and prevents insider data leakage

 

Threats can emerge from the outside as well as the inside. Insider abuse can be a leading cause of data leakage. Of course, it’s not always a malicious insider; in many cases, data leakage is caused by negligent users who have access to resources, and are either compromised or take actions that inadvertently lead to data leakage. ARM takes special care to audit all changes within the ARM Logbook. The Logbook report enables admins and auditors to report on events and persons as needed to support investigations or auditor questions.

 

ARM also includes automated reports designed to meet regulatory compliance initiatives, such as NIST, PCI DSS, HIPAA, and GDPR. The flexible reporting views allow you to ask questions to quickly generate a report, which can be exported in an audit-ready format.

 

As mentioned earlier, ARM allows access rights management to be delegated to assigned staff members—placing control of the access rights assignment with the data owners that know their data. Changes made by these data owners are also audited so nothing goes unmonitored. ARM is designed to make your job easier—it helps you answer the questions you need to answer.

 

ARM is our gift to you this holiday season. It aligns with the SolarWinds mission to make your job as an IT technology professional easy. With Access Rights Manager, we make security easier too; we call it security simplified. If you are thinking of what you can do for yourself this holiday season, consider SolarWinds Access Rights Manager. It could turn out to be the gift that keeps on giving.

Have you adopted Azure cloud services into your IT infrastructure? And do you know how much you paid last month and for what? And what about forecasting? Are you able to forecast your Azure spending in the current month? If the answer is no, don’t worry, you are not the only one. Unfortunately, Azure billing is really complicated with more than 15,000 SKUs available, and each have their own rate. But SolarWinds is here to help you! We’re proud to introduce a brand new free tool in our portfolio!

Cost Calculator for Azure is a standalone free tool that can help you discover how much you are paying for your Azure cloud services. It is as easy as it could be – you put the credentials of all your Azure accounts into the tool, so it can do all the work for you, telling you how much you really pay and for what specifically. This tool is designed to help all budget holders and SysAdmins of any sized-business who are responsible for cloud resources in their companies.

 

Cost Calculator for Azure at a glance:

  • No installation
  • Support
  • Show cost of all assigned Azure accounts and their subscriptions plans. There is no need to have more instances and work with Excel spreadsheet to have an overall number.
  • Show spending in current month, last month, last quarter, or year? Still not enough? You can set up your own timeframe that fits you.
  • Find orphaned objects
  • Consolidate all spending and show the final expense in users‘ preferred currency.
  • Filter spending

As you can see, Cost Calculator for Azure is a lightweight and easy to use tool that can help make your IT professional life a little bit easier thanks to better forecasting of your Azure cloud spending. And the best thing comes at the end – Cost Calculator for Azure is available completely for FREE!

So, why don’t you give a try? Click the link below to download your Cost Calculator for Azure free tool by SolarWinds. No installation needed.

 

Cost Calculator for Azure – Download Free Tool

Did you ever dream you had a Ferrari® parked in your garage? How about a Porsche®? Or perhaps a finely engineered Mercedes-Benz®?

 

When I was eight years old, my father briefly flirted with the idea of buying a Ferrari. He was 38. I don't believe additional explanation is needed. However, as the oldest child, it was my privilege to accompany Dad to the showroom. And there, right next to the 308 GTB was a Ferrari bike. No, not a motorcycle. A regular pedal-with-your-feet bicycle. And I knew at that moment that this car was my destin... I mean my Dad's destiny. And that bike leaning beside it was mine, Mine, MINE!

 

You may be asking yourself why Ferrari would bother making a bicycle?

 

The obvious answer is "marketing." With a cheeky smile, Ferrari can say "anyone can own a Ferrari." But there's more to it.

 

Before I dive into the OTHER reason why, I just want to point out that car-manufacturer-bicycles is not just a thing with Ferrari. The trend started in the late 1800s with European car maker Opel® and includes Peugeot, Ford®, Mercedes-Benz, BMW®, and Porsche.

 

So what's the deal?

 

Some companies, like Opel, started with bicycles (they ACTUALLY started with sewing machines) and built up their mechanical expertise in sync with the rise of automobile technology. But most decided to build bikes as a side project. I imagine that the underlying message went something like this:

 

"Our engineers are the best in the world. They understand the complex interplay of materials, aerodynamics, maneuverability, and pure power. They are experts at squeezing every possible erg of forward thrust out of the smallest turn of the wheel. While we are used to operating on a much larger scale, we want to showcase how that knowledge and expertise translates to much more modest modes of conveyance. Whether you need to travel across the state or around the corner, we can help you get there."

 

I was thinking about that Ferrari bicycle, and the reasons it was built, as I played with ipMonitor® the other day.

 

For some of you reading this, ipMonitor will be an old and trusted friend. It may even have been your first experience with SolarWinds® solutions.

 

Some quick background: ipMonitor became part of the SolarWinds family in 2007 and has remained a beloved part of our lineup. ipMonitor is nimble, lightweight, and robust. A standalone product that installs on any laptop, server, or VM, ipMonitor can help you collect thousands of data points from network devices, servers, or applications. It's simple to learn, installs in minutes, and even comes with its own API and JSON-based query engine. Users tell us it quite literally blows the doors off the competition, and even reminds them of our more well-known network monitoring software like Network Performance Monitor (NPM) and Server & Application Monitor (SAM) server monitoring software.

 

Which is exactly why I remembered that Ferrari bicycle. It also was nimble, lightweight, and robust—a standalone product that could be implemented on any sidewalk, playground, or dirt path. It installed in minutes with nothing more than a wrench and a screwdriver, and epitomized the phrase "intuitive user interface."

 

And, like comparisons of ipMonitor to NPM, my beloved Ferrari bike was amazing until it came time to add new features or scale.

 

Much like the Ferrari bicycle, ipMonitor was designed by engineers who understood the complex interplay of code, polling cycles, data queries, and visualizations. Developers who were used to squeezing every ounce of compute out of the smallest cycle of a CPU. While used to creating solutions on a much larger scale, ipMonitor let us showcase how that knowledge and expertise translated to much more modest system requirements.

 

ipMonitor is designed to perform best in its correct context. For smaller environments with modest needs, when more feature-rich monitoring tools aren’t viable, it can be a game-changer. That Ferrari bicycle was an amazing piece of engineering—until I needed to bring home four bags of groceries or get to the other side of town. Likewise, ipMonitor is an amazing piece of engineering, but, as I said, in its correct context.

 

When you need "bigger" capabilities, like network path monitoring; insight into complex devices like load balancers, Cisco Nexus®, or stacked switches; application monitors that run scripted actions in the language of your choice; monitoring for containers and cloud; and so on, that's where the line is drawn between ipMonitor and solutions like NPM and SAM. It's not that we've deliberately limited ipMonitor, any more than Ferrari "limited" their bicycle so that it didn't have cruise control or ABS breaking. Of course, this isn't an either-or proposition. No matter your monitoring needs, we've got a solution that fits your situation.

 

So, consider this your invitation to take ipMonitor for a spin. Even if you own our larger, luxury models, sometimes it's nice to get out and monitor with nothing but the feel of the SolarWinds in your hair.

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.