6. System-Level Integration

After ensuring the basic functionality of components and subassemblies, the integration process can proceed to system-level testing and integration. This is where some of the complexities such as those illustrated in Figure 1 begin to appear as the various components interact with each other. Let’s explore the implications illustrated in Figure 1 in greater detail.

1   Transportation delays and other lags in the system can trigger instability.

The primary temperature control loop for AHU1 is based on discharge temperature. This loop modulates both the economizer dampers and the chilled water valve in sequence. In a perfect world, the chilled water valve opens only if the economizer could not deliver the required leaving conditions using 100% outdoor air. A tuned system will achieve this condition in steady state operation given the appropriate outdoor climate. However, consider what happens if an upset occurs, such as a start-up. For the purposes of discussion, assume the outdoor temperature is above the set point of the mixed air low limit cycle (thus the low-limit cycle will not impact the response we are about to discuss), but significantly below the discharge temperature set point. Under these conditions, the system should be able to achieve the required discharge temperature with less than 100% outdoor air and without the use of chilled water. But, under these conditions, most systems will respond with a sequence of events similar to the following:

1   The need to cool the AHU1 casing represents a short-term, temporary load in excess of the actual load on the system. When the system first starts, it will take some time for the mixed air temperature to drop. Adding to this is the time it takes for air to move from the mixed air plenum to the sensor location and the time for the air to cool the casing, coils, fan, etc. down to its temperature.

2   Because the system does not immediately see air at the outdoor temperature at the discharge sensor location, the system will drive to 100% outdoor air and stay there until the discharge temperature begins to drop towards set point.

3   Because the outdoor air in the casing is warmed as the AHU cools, the control process will continue to see a supply temperature above set point and will continue to drive its output to provide more cooling, causing the chilled water valve to open.

4   At some point, the excess cooling capacity represented by the high volume of cold incoming outdoor air and the open chilled water valve will eliminate the pull down load and the discharge temperature will drop below set point.

5   The control loop will respond to the drop below set point by driving its output back to reduce capacity. But, for the capacity reduction to occur, the chilled water valve must close completely and the economizer dampers must modulate to reduce outdoor air flow and increase return air flow. All of these actions take time and are compounded by the fact that heat transfer process associated with a coil tends to be non-linear; i.e. a 10% reduction in flow will generally reduce capacity by significantly less than 10%. Furthermore, the operation of an economizer process is non-linear, even when the dampers are perfectly sized, as illustrated in Figure 12.

Due to these factors, the controllers may reduce capacity too much.

6   If the capacity reduction is too much, the discharge temperature will rise above set point, causing the cycle to repeat.

Figure 12: Economizer non- linearity

Click figure to display it as a PDF.

At this point, a tuned control loop will reduce the deviation from set point that occurs with each over-shoot or under-shoot and eventually “capture” the set point. An out of tune loop will continue to oscillate, and the oscillations could (in a worst case scenario) increase in magnitude with each cycle.

The goal of tuning a control loop is to find a controller gain that will allow it to respond quickly to deviations from set point, but not so quickly that it overreacts and cannot “capture” the set point. There are a number of factors on the process-side and controller-side of the equation that impact this interaction, including transportation delays in the system, sensor response characteristics, controller speed, actuator response characteristics, heat transfer characteristics of the heat exchanger, and thermal characteristics of the load served. An important commissioning insight is to recognize that many of these characteristics are non-linear, can vary with the season, and can change with age. Thus, a loop that is stable at one point in time may become unstable later due to “natural causes.”

2   Independent control loops serving the same HVAC process stream will interact; these interactions can affect other systems.

Figure 1 illustrates how the output of one HVAC process can become the input to another. Figure 2 and the related discussion is an excellent example of what can happen when this gets out of hand. In Figure 2, nearly all control functions of the air handling system are erratic for significant portions of the test window. Also, the relief problems associated with the instability of the building static pressure control and the poor location of the outdoor air temperature (OAT) sensor combined to have the OAT reading influenced by the relief air ejected through the outdoor air intake. Consequently, the false, rapidly changing OAT reading caused other control loops linked to outdoor air temperature (reset schedules, chiller operation, etc.) to be activated and then deactivated in a relatively short time frame. The rapid cycling in the affected systems triggered instability and safety trips, and could lead to O&M problems, such as excessive wear in those systems.

3   Manually overriding outputs associated with an automated process that is unstable can provide clues regarding the source of the problem

Figure 13 is the raw data set from which Figure 2 was developed and reflects what the provider on the project saw when the data file was first retrieved and opened. Encountering a data set like this can be overwhelming. There is obviously a problem, but with all of the instability, where does one start when trying to understand the cause? By coincidence, Figure 13 illustrates a useful troubleshooting technique that answers this question. When the heating valve was forced closed by a change in operating mode (circled in red), everything stabilized. This was a strong indicator pointing to the hot water valve as a root cause. Thus, the first step in solving the problem involved tuning the control loop for hot water valve. Once the hot water control loop was stable, additional data was gathered and analyzed and the remaining issues addressed. In this example, the unstable hot water valve control loop was the primary issue, but problems with return fan speed control became apparent once the data set was cleaner. And, as luck would have it, the operating mode change that forced the hot water valve to close actually turned out to be a programming mistake.

Taking manual control of an individual control element in a process with multiple elements can be a powerful troubleshooting tool when confronted with instability or other problems on multiple fronts. For the system in Figure 13, the programming mistake forced the valve closed and revealed the problem without the intervention of the provider. However, a manual command that forced the valve closed would have revealed the same information.

4   Cascading instability can divert attention from obvious issues.

Wild, complex data streams like those in Figure 13 can divert one’s attention from the more obvious issues. For example, the system illustrated was supposed to be operating on a schedule, but that obviously was not the case. This conclusion can be reached without taking the time to understand the complexities behind all of the instability, and, in hindsight, is obvious: the unit should not have been running at 3 a.m. in the first place. The energy savings represented by getting the schedule to work most likely exceeds the energy savings associated with correcting the instability (actuator wear is another matter). But in this case, the provider was well down the road towards diagnosing and correcting the instability problem before it occurred to him that the schedule was off.

Figure 13: The raw data set that was the source for the data in Figure 2.

The clue to the cause of the problem is circled in red.

Click figure to display it as a PDF.

5   Thermal inertia and integrated operation and control go hand-in-hand.

The thermal flywheel represented by a building, its systems, and their components is a major factor in integrated control and operation. The flywheel:

·       Controls the time of occurrence and magnitude of the real load experienced by systems.

·       Has a major impact on the tuning of the control loops.

·       Can mask erratic operation at the system level from the perspective of the occupants of the area served.

The data in Figure 2, Figure 3, and Figure 13 are good illustrations of the masking effects of flywheels. Despite all of the large, erratic changes in temperature and flow on the supply side, the temperature in the occupied zone is very stable and right on set point. Because everyone was comfortable, nobody knew the problems highlighted by these examples existed until the commissioning provider retrieved and examined the data set. While other issues probably came into play to mask the problem from the occupants, the thermal inertia of the system and the building were probably the biggest factors.

6   Interactive HVAC processes can mask dysfunction in one area by equal and opposite compensation in another area.

Up until this point, our discussion of the interactions illustrated in Figure 1 has focused on the potential for them to cause instability in one loop and cascade out to others. But solving the instability problem does not eliminate all potential for dysfunction. Stable, but opposite, HVAC processes that are configured in series can interact in a manner where the downstream process masks dysfunction in the upstream process. For example, if AHU1’s preheat coil control loop (which is based on maintaining preheat air temperature set point) was stable, but out of calibration, or had a leaky control valve seat, it might deliver air warmer than required. Since the cooling coil and economizer are controlled by an independent control loop based on discharge temperature set point, their control process would attempt to compensate for the too warm air by driving the system towards 100% outdoor air or even modulating the chilled water valve open. If sufficient cooling capacity was available, the discharge temperature control process could mask the preheat dysfunction and satisfy the space, only becoming apparent on a warm day, when the capacity lost in the compensation process meant a hot space. This issue brings up several important operation and commissioning points:

·       Set point coordination is critical for a multiple-loop approach: By design, systems like AHU1 are cooling systems, delivering air at a temperature below the space temperature. The preheat process will only be active when the minimum outdoor air flow rate and temperature are such that the discharge temperature begins to drop below set point. Thus, the preheat coil’s discharge set point should always be the same or lower than the discharge set point to prevent simultaneous heating and cooling.

·       Additional monitoring points can identify problems and save money: Providing sensors beyond those required for control helps the commissioning and operating team diagnose and correct problems. If sensors are provided downstream of each heating and cooling element, unnecessary and unwanted temperature changes can be flagged by an alarm. AHU1 has a sensor downstream of all heating and cooling elements for control purposes. But this may not always be the case. For example, if a system similar to AHU1 did not have a preheat requirement, but provided a hot water coil ahead of the cooling coil for warm-up purposes, a sensor may not be required to control the hot water coil. Instead, the hot water coil could simply be sequenced in unison with the economizer dampers and chilled water coil to provide the required discharge temperature. However, having a sensor downstream of the hot water coil would allow simultaneous heating and cooling to be detected and alarmed. For this approach to be effective, relative calibration of the sensors should be considered to eliminate false alarms due to calibration errors (see Testing Guidance and Sample Test Forms for relative calibration test guidance).

·       Alarms driven by DDC control logic provide a powerful diagnostic tool when combined with monitoring sensors: To achieve a greater “bang for the buck,” alarms can provide enhanced diagnostic capabilities beyond simply comparing a condition to a limit. For example, most DDC systems can be programmed to generate an alarm if there was a temperature rise across the warm-up coil when its control valve was commanded closed, or if the warm-up coil was active at the same time as the economizer or chilled water coil.

From a design perspective, it might be argued that the problem could be circumvented by a different design approach. However, as discussed in Sidebar 4: Addressing the Interactive Control-Loop Problem Via a Different Design, most design options are really just a set of compromises and the alternatives will most likely have weak spots of their own.

The following section discusses system-level integration issues encountered by the Student Center commissioning provider.

One obvious way to eliminate the potential for control loop interaction would be to eliminate the multiple control loops and simply control the preheat coil in sequence with the chilled-water valve and economizer. This may be a viable option in some situations, but a system with a large outdoor-air fraction like AHU1 (Table A-1) in a climate zone like the one where the Student Center is located (Figure 6) will see air that is below freezing under some operating conditions. When this occurs, the system’s thermal inertia and time lags may prevent a discharge temperature based control process from reacting quickly enough to keep the air temperature downstream of the preheat coil above freezing when a major transition in operating mode occurs (a start-up for instance). The result will be freezestat trips and possibly a frozen coil. Moving the preheat coil to the minimum-outdoor-air duct would be another option, but has its own set of control issues that must be resolved. Moving the coil also introduces a significant pressure drop as well as the requirement for additional filters into the outdoor-air-intake system.

6.1. Integrating the Preheat Coil, Cooling Coil, and Economizer

The component level testing process discussed earlier made the integration of the preheat coil, economizer, and chilled water valve a fairly simple and straight forward process. A few points worth noting are as follows:

1   The warmer weather that predominated when AHU1 underwent functional testing for integrated operation at the system level was not as demanding as extreme winter conditions could be. Because the outdoor conditions ranged from mild to hot during the final phases of the AHU1 commissioning process, the system was never challenged to operate during cold weather. From past experience, the lead provider knew that cold weather economizer operation could be significantly more challenging than warm weather operation because of problems associated with economizer non-linearity and the thermal flywheel of the system. Thus, as mentioned previously, he scheduled seasonal testing focusing on economizer/preheat coil integration to occur early in the fall season and trained the operating team on what could go wrong if a sudden cold snap occurred before the testing took place.

2   Control logic was used to enhance system diagnostics. The commissioning provider worked with the control contractor to leverage the programming power of the DDC system for diagnostics by developing several “smart” alarms, including:

·       An alarm would be generated IF the system was using preheat AND the system was not on minimum outdoor air.

·       An alarm would be generated IF the system showed a temperature rise or drop across a heating or cooling coil AND the valve associated with the coil was not being commanded open.

·       An alarm would be generated IF the chilled water coil was active AND the economizer dampers were open AND the outdoor air enthalpy was not suitable for cooling.

Some of the alarms included a reference to a graphic that could be opened by the operators to review troubleshooting tips for further diagnosis and correction of the problem. To ensure the cooperation of the control contractor in this effort, the provider had included a requirement for training operators to implement several alarms of this type. He then had the operating staff create the remaining alarms as a training exercise.

6.2. Integrating the Supply Fan and Terminal Unit Start-up

The configuration of the series fan-powered terminal units installed for the museum zones, combined with the physics of the forward curved fans being used, meant that the fans would spin backward if air were blown through them. This could occur if the air handling system was started or already operating and the primary air control dampers on the terminal units were open before the series fans were started. A backward spinning fan will still deliver air to the zone but at a reduced capacity, and the condition may not be immediately detectable in the occupied zone. However, eventually the zone will no longer be able to maintain temperature set point due to the reduction in capacity. To address this issue, the provider included a functional test that verified the terminal unit fans were commanded “on” before the air handling unit was started and/or before the primary damper was allowed to open. The test cross checked rotation and performance after restarting the terminal unit fans under both operating scenarios.

What the test failed to consider was what would happen if the fan terminal was shut down by the zone-level schedule while other zones, as well as the air handling unit, remained in operation. When this occurred, a “stop” command was sent to the terminal unit fan and the primary air damper was driven completely closed. Unfortunately, the fans stopped well in advance of the primary air damper being driven completely closed because the damper actuators on the terminal units were electronic and had a 60 second stroke time. As a result, there was enough air blowing through the terminal units to spin the fans backward. Because the fans were powered by single-phase motors, they would run in whatever direction they happened to be spinning when power was applied. If they happened to be restarted before they coasted to a stop after the primary damper closed, the terminal units would not perform.

As Murphy’s law would dictate, this unlikely scenario occurred on the day of the museums’ ribbon cutting ceremony when an operator accidentally commanded the museum zone off and then back on again in rapid succession. The area, which had been performing flawlessly for days prior to the event, overheated by the end of festivities, resulting in an irate call to the dean from a member of the museum’s Board of Directors the following day.

At first the problem was a mystery. Because the zone had been off all night and then restarted, everything seemed to be working properly, as confirmed by a balancing contractor when his immediate presence was demanded at the site by the general contractor. It wasn’t until the commissioning provider had the operator go through the events of the preceding day, step by step, that he realized what had happened. The rapid off/on command sequence occurred in just the right order with just the right amount of time between them to allow the fans to spin to a stop, begin spinning backward, and then be restarted while spinning backward because the primary air damper was still closing. His theory was confirmed by a test that duplicated the rapid command sequence, which produced the anticipated result: low air flow to the zone. Recovery was simple enough: The zone was commanded “off” and allowed to remain off for a time period long enough to allow the dampers to close and the fans to coast to a complete stop. A normal restart returned the zone to service.