Performance Management

Performance Management Practices:

  1. Identify performance problems,
  2. Identify an optimization strategy
  3. Plan, tune and execute the optimization strategy itself
  4. Analyze and monitor the system.

Performance Management of a Business System

Invoking an Agile mind-set, performance tuning is an activity that:

  1. occurs repeatedly, on increasing levels of detail;
  2. is done as a scoped, timeboxed, goal-oriented effort;
  3. is driven solely to provide business value;

Performance Management Basics

Key Activities

There are four (4) key activities that occur while managing performance:

  1. Stress and Load Testing
    • determining how much load a site can handle while maintaining SLAs;
    • determine if there are defects that arise under load;
    • determining if a site/app can recover from overloading.
  2. Profiling
  3. Capacity Planning
  4. Benchmarking

Core Principles

  1. Your test environment should be constructed to be as close to production as possible.
    1. Building-block hardware
    2. Infrastructure version and patch level
    3. Configuration (all tiers, including OS)
    4. Core monitoring
  2. Know your User.

Practices

  1. Fully-Automated Deployment
    1. Externalized environment configuration
    2. Model and automate

Client Stuff

  • Metrics
    • # of sessions per day = a days volume
    • # of concurrent sessions in a given session timeout (say 1 hour) = peak volume from memory perspective
    • Track Metrics along the main flow along a sliding window (report out every 5 minutes):
      • # of total sessions.
      • # of sessions where items are added to the cart.
      • # of sessions where checkout occurs.

Metrics/Measures/Calculations

  • Availability
    • = realized uptime / expected uptime. (where "uptime" is time where the end-user can perform the revenue-producing action.)
    • = MTBF / (MTBF + MTTR) where MTBF = mean time between failures and MTTR = mean time to repair.
    • (e.g. availability = 99.9%)
  • Performance — how quickly
  • Scalability — how many concurrent users are supported at peak.

Questions

  • What are the metrics we should track?
    • There are measured values, then there are calculated values, what are they all? What does that whole picture look like?
    • What are the important values?

Uncategorized Notes

  • Performance/capacity improvement is a mixture of more, faster hardware and tuning use of those computing resources.
  • The most effective performance technique is elimination.
  • One of the most expensive, impactful and risky moves is to change code. This is a point of high-leverage; if you can reduce that cost and risk, you can make that play more possible.
  • Complexity: the more complex the system, the greater the degrees of failure, the more likely a failure occurring.

"Monitors are the unit tests of systems"

  • When the root cause of a problem is monitor-able, put a monitor on it. It helps mitigate recurrences and reminds everyone of the mistake we made…automatically.
  • It inches upward our level of confidence in proper execution: "we know it's not logging because our logging monitor is green."
  • Keep it simple and minimize impact:
    • the right level of "richness": favor counters over events.
    • minimize the smarts.
    • shoot data into a queue asynchronously.
    • never ever emit an exception; monitors fail quietly. Their lack of reporting is the signal there's something wrong.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License