Abstract The quality of the information in meteorological data time series has always been a concern for the scientific community. The scarcity of information requires the use of data fill-in techniques and methods that frequently ignore the orographic features of the study area, as well as the method accuracy, leading to inaccurate results with important consequences. In this context, this paper seeks to evaluate two methods for filling rainfall data, namely Normal Ratio and Linear Regression Model (LRM), applied to two morphostructural zones in the south central region of Chile, through an error analysis of a 32-year series of precipitation data. Both methods were compared considering 65 of 112 stations across the region, located on the coastal plain and central valley. Subsequently, two time-consistent base stations were defined, one for each area; pluviometric and proximity criteria, as well as the amount of information available, were applied to choose five neighboring stations. After calculating the correlation between stations, using a probability analysis by quartiles and the Shapiro-Wilk test the normality of the LRM models was confirmed, as well as the homogeneity of the adjusted predictions and residuals. The Normal Ratio method evaluated rainfall estimates by weighting mean annual rainfall in the neighboring stations, where each weighting factor corresponds to the ratio between the precipitation figure recorded in the auxiliary station and the mean annual rainfall of the respective station. The performance of each method was assessed using the following estimators: Mean Error, Coefficient of Determination (CoD), Mean Squared Error (MSE), RootMean-Square Error (RMSE), Sum of Squared Residuals (SSR), Mean Relative Error (MRE), and Mean Absolute Percentage Error (MAPE). The statistical analysis reveals a greater range of temporal variation in precipitation in the Central Valley relative to the Coastal Zone, except for one station, and a positive relationship between altitude and a broader pluviometric range. LRM shows greater data dispersion at station Chiguayante; moreover, according to the CoD, this is the station with the lowest prediction potential. In most of the cases analyzed, we found an inverse relationship between the sum of squared residuals (SSR) and the number of annual precipitation data available in each station. The estimators SSR, MSE, and RMSE penalize large residuals, revealing that for the 32-year series studied, The Normal Ratio yields better performance and lower prediction error in the target stations in both morphostructural areas, with Dichato as the station with the lowest mean error and Mayulermo as the station with the lowest mean relative error, for both methods in the sample selected. As Dichato was the station with the greatest Euclidean distance from the base, the distance is discarded as a major predictive factor, contrary to our findings regarding data dispersion. The analysis of residuals (SSR, MSE, RMSE) indicated that the Linear Regression Model is influenced by outliers. However, these values were considered, since eliminating the extreme values, as is usually done in regression analysis, may result in losing relevant information about maximum and minimum precipitation that is useful in the analysis of extreme climatic events such as drought. The efficiency of both methods for predicting actual values was evaluated through the estimators SSR and CoD, showing that in the present analysis, the Normal Ratio involves a higher CoD and a lower residual variability. Although regression remains a widely used and recommended method, the Normal Ratio should be reconsidered for the prediction of missing data in precipitation series in areas of south central Chile with records available for neighboring stations that could support the equation for the data required. The quadratic estimators MSE and RMSE allow inferring that those stations showing a lower mean error, where the predictive methods analyzed were most successful, were the stations where precipitation showed a more stable behavior around the mean. The dimensionless estimators MRE and MAPE confirmed the advantage of the Normal Ratio and determined that the best mean performance of the prediction was related to data dispersion rather than to the Euclidean distance between stations and the base station. The two methods evaluated offer a simple way to estimate meteorological data when the information available is insufficient; however, the Normal Ratio demonstrated a better performance relative to LRM for estimating missing precipitation data, regardless of the geomorphological area selected.
Resumen Una de las principales preocupaciones de los científicos al trabajar con datos temporales es la calidad de la información. Los datos meteorológicos, que son entradas de modelos y predicciones hidroclimáticas, generalmente carecen de series completas. El uso de técnicas de relleno frecuentemente ignora las características orográficas del área de estudio y la precisión del método, produciendo alteraciones en los resultados con importantes consecuencias. El objetivo de este trabajo es evaluar los métodos de relleno de datos pluviométricos razón normal y modelo de regresión lineal (LRM, por sus siglas en inglés), por medio de un análisis del error de estimación aplicado a un registro de 32 años de precipitaciones en dos unidades morfoestructurales distintas localizadas en la región del Biobío, Centro Sur de Chile: la planicie costera y el valle central. Los resultados evidenciaron que el método de Razón Normal presenta menor variabilidad en los errores de estimación y una mejor aproximación a los datos reales para ambas zonas.