PowerVM

 View Only

PowerVM Firmware Maintenance Overview

By Chet Mehta posted Fri June 05, 2020 02:21 AM

  

Introduction

There are millions and millions of lines of code within your Enterprise Power System.  Firmware updates are used for two main purposes.

  1. Provide new function to the existing system or support a new system (Major Release)
  2. Fix bugs (Service Pack)

Picture showing composition of Firmware Components

In the picture above, you can see the different firmware stacks and how they interact.  The image is the classic configuration with the HMC doing full management of the system, plus the new PowerVM NovaLink configuration enabled showcased with a dotted line.  Either way, you still need to make sure to keep it all up to date!

Terminology

  • Release Level: A major new function (introduction of new hardware models and significant function and features enabled via firmware). This firmware upgrade is disruptive.
  • Service Pack (SP): Primarily firmware fixes and minor function changes applicable to a specific Release Level. These firmware updates are usually concurrent.
  1. Concurrent: A code update that allows the operating system(s) running on the Power system to continue running while the update is installed and activated.
  2. Deferred: A code fix that is concurrent but does not activate on the system until the server has been rebooted.  Many times this type of fix is related to chip initialization changes.
  3. Partition Deferred: A code fix that is concurrent but not activated until a partition reactivate is performed.
  4. Disruptive: A code fix which requires a system reboot during the code update process.

Deferred, Partition Deferred, and Disruptive content is identified in the firmware README. 

Service Pack Severity

  • NEW: New Features and Functions. This is considered a New Release level for a product.
  • PE (Programming Error): This Service Pack addresses minor issues. It can be installed when convenient.
  • ATT (ATTention): This Service Pack addresses low impact and low potential issues. It should be installed at the customer's earliest convenience.
  • SPE (SPEcial attention): This Service Pack addresses high impact but low potential issues. It should be installed at earliest convenience.
  • HIPER (High Impact / PERvasive): This Service Pack addresses high impact and/or pervasive issues with significant customer impacts, and therefore should be installed as soon as possible. Because "high" and "significant" customer impacts are not standard measurable units, the following guidelines are used to determine if a Service Pack needs to be marked as HIPER. A potential HIPER Service Pack is then reviewed by our POWER Firmware Distinguished Engineer. The general guidelines include:
    • Pervasiveness: The addressed issue:
      • Already has occurred in the field or it is likely to occur three or more times in the field.
      • Affects a large number of machines.
      • Occurs during a commonly used feature or function.
    • High Impact: The addressed issue can lead to:
      • Data Integrity exposure
      • System outage
      • Loss of major resource or function
      • Significant performance impact
    • No reasonable workaround exists to eliminate or reduce the exposure.

The assessment is made collectively by the team comprised of Development, Test, Client Care, and Service.

NOTE: Service Packs may be categorized as "HIPER/Non-Pervasive" or "HIPER/Pervasive" based on impact and likelihood of occurrence.

"My Notifications" and the Service Pack's README will provide details on the most susceptible configuration &/or use case.

The most important one to look out for is HIPER.  One of the worst feelings we have in development is when we see a customers system hit a problem for an issue we already had released a fix for.  Just because you haven't hit the issue yet doesn't mean you won't, so update!  Please review all HIPER fixes and install any applicable ones as soon as possible.

There are all sorts of ways to stay informed and be notified about new Service Packs (see links section below). Also if you don't know where to get started, check out the Fix Level Recommendation Tool, which will recommend specific updates for your system.

Under the Hood of Power Firmware Maintenance

The Service Processor of the server is running an embedded operating system with complex power firmware applications running on it; one of which is an application responsible for handling code updates.  Depending on the server's configuration, a firmware update can be performed from different entities:

 

Either way, the firmware is sent to the Service Processor as a series of binary images.  These binary images are written into a special location in the FSP filesystem.  This special location is actually an alternate boot filesystem (i.e. the FSP has 2 boot locations to support firmware updates and redundancy).  The firmware images are not only for the FSP, but also for PHYP and PFW, etc.  The FSP is then rebooted and when it comes up it is using the new firmware.  Depending on the type of firmware update (concurrent or non-concurrent) or upgrade, the system is either reIPL'd (disruptive) or remains booted throughout the entire firmware update process (concurrent or deferred).  If there is an error during the firmware update, the FSP will automatically go back to the previous level of firmware.  The technology involved in doing a concurrent code update to the PHYP firmware is pretty amazing stuff (i.e. your partition continues to run and operate as the firmware underneath it updates itself).

The HMC code update process is a bit simpler in that it's running on a more standard computer system.  It has the same features as the FSP based code update process.  You can always go back to the previous version if something bad happens during the update.

The Power Firmware team averages 2 major releases a year.  Though most releases include new hardware enablement, the focus of one of the releases is new function.  Each release is supported for a minimum of 2 years which means that Service Packs are scheduled approximately every 3 months. Besides planned Service Packs, we sometimes release an out-of-cycle (a.k.a. "emergency") Service Pack to address any critical issue that may surface (e.g. security vulnerabilities).

Release Process

The release process is similar between major releases and service pack updates.  Major releases (especially ones which involve a new chip technology) have a lot more software simulation involved in them throughout the release process.  Each developer has access to their own full system simulator in which they can run and validate the code as they write it.  Once the hardware arrives, we shift into test phases, with the test phases becoming more complex and intricate as we get close to the release date.  The service packs follow a similar version of this process.  Release readiness is based, in part, by looking at previous releases (i.e. P8, P7, ...) and analyzing types and numbers of defects being found by the test teams as well as the interval between defect discovery.

Our most recent firmware release was part of PowerVM 2.2.4.  The linked blog post provides a great summary of all of the new features released.

We perform extensive unit and end-to-end testing to provide the highest quality for both major releases and service pack updates.

Closing

Firmware updates are important!  We've put a huge focus recently on ensuring proper classification of our service packs so please keep a close eye on them when they come out. An easy way to stay connected is by subscribing to the "My Notifications" (support.ibm.com) and Twitter (@IBMPowereSupp), which provides a means of being informed about all Firmware updates including critical firmware-related events (e.g. HIPER notifications), when a new Service Pack is available, and other news and technical notes. See the link in the "Links and Tools" section.

Links & Tools

Fix Central: Location for all Power Firmware updates, including System Firmware, HMC code, and I/O.

My Notifications: Stay informed of new firmware levels.
Fix Level Recomendation Tool (FLRT): A tool which compares current levels of firmware and software and recommends appropriate levels to update to.

Firmware Overview and Recomendations Presentation (A presentation from our Product Engineering team with more details and links)

Contacting the PowerVM Team

Have questions for the PowerVM team or want to learn more?  Follow our discussion group on LinkedIn IBM PowerVM or IBM Community Discussions



#PowerVM
#powervmblog
0 comments
21 views

Permalink