# Method ## Basic The MIdAS Python implementation aims to perform efficient bias adjustment of large data sets, such as multi-decadal climate model data over large domains. Adjustments are performed for - single grid points, i.e. spatial dependency is not accounted for - single variables, i.e. dependency between variables is not accounted for - daily time steps - model data when model data and reference data are located on the same grid ### Input The input data, both for `reference` and `model`, should come in the form of timeseries with a fixed timestep, e.g., daily. We focus on the adjustment of model output that comes in the form of a three dimensional field with two horizontal dimensions and one time dimension, {math}`X(x, y, t)`. Often, the horizontal dimensions will be given by longitude and latitude, but particularly for regional models, other dimensions are common too, for example in projections such as the Lambert conformal projection. These fields embody physical phenomena, e.g. precipitation and near-surface temperature and are referred to as variables. Many variables have standard variable names, for example, precipitations is usually called `pr` in files and near-surface temperature is often referred to as `tas`. In addition, meta-data like `units` and `standard_name` are often specified in additional standards, for example, the `CMIP6 data request` (https://cordex.org/experiment-guidelines/cordex-cmip6/requests-to-cmip6/) for global climate model output, or the corresponding `CORDEX standard` (https://cordex.org/experiment-guidelines/cordex-cmip5/cordex-core/cordex-core-simulation-framework/)for the output of regional climate models. Technically, the input data is always in `netCDF` files that adhere to the `CF conventions`(https://cfconventions.org/) and follow the aforementioned applicable standards. Bringing such metadata to CF-convention conformity is the process `Standardization`. ### Output template generation The output filename can be set by the `-o` command flag. When the `-o` flag is missing, MIdAS automatically generates an output template for the filename given by user, ie., outputfilename.tmp.nc. This works well for files with the same basic filename structure, where only the time stamp, variable and simulation components differ. For example, given a variable `tas` and two input files: :::{code} 1:"/PATH/tas_{project}_historical_{run_model_version}_day_20060101-20101231.nc" 2:"/PATH/tas_{project}_scenario_{run_model_version}_day_200110101-20151231.nc" ::: The output template starts with the variable that is being computed, in this case `tas`. Thereafter, the parts that are similar between the files are kept. If the input files contains both historical and scenario simulations, these will be concatenated with a hyphen in the output template. The time period of the template will always represent the whole period as covered by all files. Thus, it will be reformatted to ensure hyphenated form, `start-end`, even in the case of a single timestamp, e.g, a single year. If the input files contains a frequency keyword, the keyword will be replaced by the output frequency. Otherwise, the output frequency will be added to the end of the output template. The input files from above would result in the following output template: ```{code-block} bash "tasAdjust_{project}_{run_model_version}_{frequency}" ``` ```{Note} This works well with CMIP/CORDEX style filenames. ``` ## Main features (QAM)= ### Quantile mapping The core of bias adjustment in MIdAS is an empirical quantile mapping (EQM) method, where the cumulative distribution functions (CDFs) of the reference data set and of the model data set are used to bias-adjust the original value from model data. We use superscripts to distinguish `reference` and `model` by writing {math}`X^{\text{ref}}_j` and {math}`X^{\text{mod}}_j`, respectively, where `j` stands for each timestep. Since we are only interested in statistical properties of the timeseries, they may have differing time coordinates, which we refer to as {math}`t^{\text{ref}}_j` and {math}`t^{\text{mod}}_j` respectively. The bias adjustment consists of two steps, the calibration and the adjustment. In the calibration step, a transfer function is constructed, which is later applied in the adjustment step. The overall bias adjustment looks like this: 1. Use calibration {math}`C` to compute transfer function(s) ```{math} :label: mymath F_{j} := C_{j}(t^{\text{ref}}_j, X^{\text{ref}}_j, t^{\text{mod}}_j, X^{\text{mod}}_j) ``` 2. Adjust model data using transfer function(s) ```{math} :label: mymath_xm \tilde{X}^{\text{mod}}_j = F_{j}(t^{\text{mod}}_j, X^{\text{mod}}_j) ``` The adjustment step is completely local and thus particularly well suited for parallelization. Note that the adjustment is never applied to the reference data. We treat the timeseries for every point independently. This implies, that `reference` and `model` data must be located on the same spatial grid. When this is not the case, a remapping must be performed outside of MIdAS before the data can be used as input. ```{Note} We limit ourselves to daily data. Dissaggregation to sub-daily can easily be performed outside MIdAS. ``` ### Day-of-year The bias adjustment is performed separately for different times of the year. MIdAS uses a day-of-year (DOY = [1,365]) separation where the time series are split in smaller sections depending on the day of the year. Typically, a moving window of 15 days before and after is used ({math}`dD = 15`). We here assume a 365-year calendar, i.e. without leap days. Leap days are not explicitly handled here, and the DOY is defined from the first of January and counts the days until 365 is reached. Therefore,the 31 December on leap years is not included, but will be bias-adjusted with the same parameters as DOY = 365. (sf)= ### Linear smooth spline fitting We fit a linear smoothing spline (LSS) function to the Q–Q plot (similarly to the `fitQmapSSPLIN` function proposed by [Gudmundsson et al., 2012](REF),[2016](REF), using the routine `splrep` from [Virtanen et al. (2020)](REF), which in turn is based on the `FITPACK` routines by Paul Dierckx ([Dierckx, 1975, 1981, 1982, 1993,1995](REF)). This approach allows a good approximation of {math}`F_{j}` with far fewer knots in the spline, while still guaranteeing a good representation of all data points. A linear function is fitted to the 90% most central data points (e.g., the lowest and highest 5 % of data are removed) to avoid excessive impact from outliers in the tails. The weights to the spline function are then defined according to the standard deviation of every point, which is estimated as the distance of that point from the linear fit. To avoid excessive weights for individual points, a minimal standard deviation of 1 percent of the midpoint of all reference values are set to the threshold. All points that have an estimated standard deviation smaller than this get assigned the minimal standard deviation as their estimate. The smoothing factor to the spline function is set the number of data points. ### Time-scale Separation The time-scale separation involves the computation of anomalies. This is commonly done in one of two ways, additive or multiplicative. (addTS)= #### Additive Time-scale Separation We calculate the daily mean, {math}`X_{m}`, for a day, {math}`d` with a given window size {math}`31` days as: ```{math} :label: xm X_{m} = \left\langle X_{d'} \right\rangle_{d' \in [d-15, d+15]} ``` The daily anomalies ,{math}`X_{da}`, are calculated as: ```{math} :label: xda X_{da} = X_{d} - X_{m} ``` where, {math}`w` refers to as a week. This allows us to decompose the original data as: ```{math} :label: xd X_d = X_{m} + X_{da} ``` This decomposition facilitates the separate bias adjustment of the three components, giving the final bias adjusted data as: ```{math} :label: mymath_xddecomp \tilde{X}_d = \tilde{X}_{m} + \tilde{X}_{da} ``` #### Multiplicative Time-scale Separation We calculate the daily mean, {math}`X_{m}`, for a day, {math}`d` in the same way as for the additive case, i.e. as a running mean with a given window size: {math}`31` days as: ```{math} :label: mul_XM X_{m} = \left\langle X_{d'} \right\rangle_{d' \in [d-15, d+15]} ``` The daily anomalies are now calculated as: ```{math} :label: mul_XDa X_{da} = X_{d} / X_{m} ``` This allows us to decompose the original data as: ```{math} :label: mul_XD X_d = X_{m} * X_{da} ``` This decomposition facilitates the separate bias adjustment of the three components, giving the final bias adjusted data as: ```{math} :label: mymath_XDdec \tilde{X}_d = \tilde{X}_{m} * \tilde{X}_{da} ``` ### Extrapolation A two-step solution is found work well for this issue: (i) excluding data from the tail data in the calibration period, the extrapolation feature is activated for all time periods, even the calibration period, and (ii) applying an outlier insensitive method for linear regression to find an extrapolation parametrisation for the tail ([Berg et al., 2024](REF)). The Theil-Sen ([Theil, 1950](REF); [Sen, 1968](REF)) regression (outlier insensitive method) and the normal regression (outlier sensitive method) are implemented to be used to fit. The former fits median of slopes derived from each individual pair of points in the tail, wheras the latter fits the extreme values by using normal least square. To active it, users need to determine the method to be used, the Theil-Sen (theilsen) or the normal regression (ols), and percentage of data to be excluded. The syntax is `extrapolation_method`_`extrapolation_parameters`. For example, `theilsen_r01t05` implies that 1) the 1% of the data on the upper tail is excluded; 2) the theil-sen method is used for fitting and its parameters are calibrated on percentiles 94-99. (sp)= ## Indirect bias adjustment For some variables, bias-adjusting meteorological variables in a direct way (same type of variables) may introduce errors. Indirect bias adjustment is therefore needed. (hursinv)= ### hurs_inv Bias-adjustment for inverse relative humidity, `hurs_inv`, is implemented. It is useful in the region where relative humidity is usually high. Before bias-adjustment, the relative humidity is inverted as: ```{math} :label: mymath_hursinv hurs\_\text{inv}= 100 - hurs ``` After bias-adjustment, bias-adjusted inverse relative humdity is then inverted to be relative humidity by: ```{math} :label: mymath_hursadj \widetilde{hurs}^{\text{mod}}_j = 100 - \widetilde{{hurs\_\text{inv}}}^{\text{mod}}_j ``` A range check is always made at the end in order to ensure {math}`\widetilde{hurs}^{\text{mod}}_j = [0,100]`. (tasmax_tasmin)= ### tasmax & tasmin Daily maximum temperature, `tasmax`, and daily minimum temperature, `tasmin`, are indirectly adjusted. Instead of an independent bias adjustment of `tasmax` and `tasmin`, we follow the ISIMIP3 method to bias-adjust these two variables from bias-adjusted `tas`, `tasrange`, and `tasskew` values. `tasrange` describes daily temperature range, which is calculated as: ```{math} :label: mymath_tasrange tasrange = tasmax − tasmin ``` `tasskew` describes the skewness of the daily temperature cycle with consideration of daily temperature range, daily mean temperature and daily minimum temperature. It is calculated as: ```{math} :label: mymath_tasskew tasskew = (tas − tasmin)/tasrange ``` `tasrange` and `tasskew` are pre-processed outside MIdAS. These two variables are bias-adjusted by the same way as precipitation (operator, pr) without using `-s`. Bias adjustment of `tasmax` and `tasmin` are finally achieved via post-process outside MIdAS: ```{math} :label: mymath_tasmaxadj \widetilde{tasmax}^{\text{mod}}_j=\widetilde{tasrange}^{\text{mod}}_j+\widetilde{tasmin}^{\text{mod}}_j ``` ```{math} :label: mymath_tasminadj \widetilde{tasmin}^{\text{mod}}_j=\widetilde{tas}^{\text{mod}}_j - \widetilde{tasskew}^{\text{mod}}_j*\widetilde{tasrange}^{\text{mod}}_j ```