How Big Data analytics helps in weather forecasting. An example of Windy.app

Surely you've heard of such a concept as Big Data.

In short, it's just a lot of data about anything that can only be stored and analyzed by supercomputers. That is to say, it is very complicated.

But Big Data has one major advantage, which follows from the name of the concept — it gives you a big picture, which you can't get in any other way. That's why, with the development of computers into super machines, Big Data is used in many different areas of life, including meteorology, where it is one of the most important concepts. Weather forecasting is actually the oldest area to use Big Data. At the same time, it is the future of meteorology.

In this article, two Windy.app experts talk about the relations between Big Data and weather. In other words, you will find out what is behind a simple-looking forecast table or weather map you use in your weather app.

But we'll start at the very beginning — with the collection of Big Data using weather stations and other weather instruments.

Ilya Drigo, professional meteorologist, developer, and researcher

Pavel Konstantinov, assistant professor of the Department of Meteorology and Climatology at Lomonosov Moscow State University (MSU), Ph. D.

How we collect Big Data from weather stations and other weather instruments

Ilya: A necessary side-step: one of the most well-known models used to describe Big Data is the so-called 5V model. Based on it, Big Data has the following properties: Volume, Velocity, Variety, Veracity, and Value. Big Data in weather forecasting was one of the first to be fully aligned with the 5V model. Indeed, these are data of enormous size (Volume) about a rapidly changing environment (Velocity) obtained from completely different sources (Variety) for which you need to verify and assess their accuracy (Veracity) and which are important for countries' economies and people's lives.

So the first step to a quality weather forecast is collecting data about atmospheric conditions. The more data you have, the better. Every single day meteorologists all around the world gather, process, and analyze terabytes of information about the condition of the atmosphere and the oceans from all kinds of sources: weather stations, weather satellites, weather buoys, weather balloons, and weather radars — these are the five main weather instruments.

However, the stations are the most numerous: there are about 40 thousand of them around the world, counting only the official ones. Put simply, these are weather observation points. They register the data in their designated location and send those over to data processing centers. At Windy.app, we don't just show ready-made forecasts. We also continuously receive and display, in real-time, data from tens of thousands of these stations all around the world, information about the state of the ocean, and even information about precipitation, also in real-time.

Pavel: Weather stations are supervised by the countries in which they are located. There is also the World Meteorological Organization (WMO), a special agency of the UN whose purpose is to ensure that the number of these stations does not decrease and the system remains operational.

Yes, incidentally, weather forecasting is actually the oldest area to use Big Data. Transmission of these data is in fact the first example of free-to-flow and free-of-charge distribution of such information in the world. This is why meteorologists are rightly considered citizens of the world.

There is a notion that land meteorology is more simple than marine meteorology because there are more land stations. I would not agree. Although it is true that there are more land stations, environmental conditions on land are more diverse. It is more difficult to produce a forecast here than for a more-or-less uniform water area of a sea or bay. At the current stage of weather observation development, we just cannot qualify one as more simple and the other more difficult.

Weather radars are a relatively new invention. Nevertheless, we have high hopes for them. Radars allow us to see the bigger picture of the weather. For instance, when we see in a forecast that a certain region has a big rain cloud going over it, most of the time, this Big Data is received from radars.

Timur Garifov / Unsplash

How we work with Big Data: store it and process it using weather forecast models

Ilya: The weather data collected is sent to data centers and then used for weather forecasting calculations. Nowadays, forecasts are calculated using complex algorithms called forecasting computational models. The operating principle of these is solving hydrophysical equations describing atmosphere behavior. As input about the condition of the atmosphere all around the world, these models use the data obtained through meteorological measurements.

The task of weather forecasting is so computationally demanding that it uses the most powerful supercomputers present. Calculating weather for the whole world is so complicated and expensive that only a few hydrometeorological centers in the world can afford it.

At the same time, the accuracy of a modern weather forecast is significantly high: for example, we can predict tomorrow's weather with an accuracy of about 90–92%.

But weather forecasting can be done not only for the whole world but locally. At Windy.app, we also use our model, WRF8. With this model, we calculate the forecast for the whole of Europe and East Asia (Japan and South Korea) every day. We use cloud supercomputer processing power and the WRF-ARM model effective code to provide our users with some of the most accurate everyday weather forecasts available on the market today.

This WRF model is developed and supported by the worldwide community of meteorologists and developers. Due to the high demand for calculation parallelization effectiveness and code performance, the WRF-ARM model is implemented in Fortran, a general-purpose, compiled imperative programming language that is especially suited to numeric computation and scientific computing. Even though WRF-ARM is an open-source model, to adjust it effectively requires a great amount of knowledge and effort from the experts involved: meteorologists, developers of highly effective parallel code, and DevOps specialists.

Timur Garifov / Unsplash

So every night we download fresh weather data from the National Oceanic and Atmospheric Administration (NOAA) servers, process it, and use it as our initial and boundary conditions to run our WRF8 model. We calculate the forecast for 3 days for the whole of Europe with a resolution of 8 km (4.9 mi) and for East Asia with a resolution of 3 km (1.8 mi).

For the system to work with maximum effectiveness possible, we use cloud supercomputer processing power provided by Oracle. Grid calculations are effectively parallelized using the paradigms of MPI and OpenMP parallel programming, so, for the most effective calculations, we need a cluster with a low latency of the inner-cluster network and a big number of CPU cores. Apart from that, to optimize at the compiler level, we need direct access to the cores. By running a large number of tests in the Oracle cloud infrastructure, we managed to find the optimal configuration of a bare-metal cluster using low-latency RDMA networks to minimize the calculation time, on the one hand, increase the calculation accuracy on the other, and, as a result, optimize the financial costs.

Read more about the hardware side of our calculations in the official Oracle blog.

In general, the "raw" forecasts the forecasting model generates are very difficult to comprehend. These are just huge binary files of varying formats with hundreds of various variables. Weather applications like the Windy.app are exactly what presents these weather forecast data in a format convenient and understandable for the users: kitesurfers, sailors, fishermen, paragliders, and simply everyone who is interested in meteorology. That is, multiple times a day, we download a huge amount of data from all kinds of sources or weather models (both free-of-charge and paid ones), process it, automatically check their credibility, and then put those into specialized storage. This way, our users get the most up-to-date and accurate forecast for any world location as quickly and effectively as possible from tens of various sources.

Pavel: Why so many weather models? In different areas of the Earth's surface, models also differ in accuracy. So for the territory and type of sports you are interested in, you can end up choosing both a successful model and an unsuccessful one. This can make experiences of using the same weather app different for the same area.

At Windy.app, we compare several forecasts provided by different models. Basically, the Windy.app is a hub aggregating various observation data and data from different models. We then structure those and offer them to our users in an easy-to-understand format while also giving them a chance to make their own decision and act based on these data.

Timur Garifov / Unsplash

How we use Big Data to make more accurate weather forecasts

Ilya: Forecasting methods are being constantly improved, and their accuracy increases over time. However, due to the stochastic (that is, chaotic, random) nature of weather processes, uncertainty is still very high, and that is what makes weather forecasting such a difficult task. This uncertainty can be accessed and decreased by using methods of post-processing of model weather forecasts.

There is a whole set of analytical operations which scientists use on the resulting enormous amount of data to extract exactly the valuable information they need. One example of such post-processing is converting a forecast's data about water vapor concentration to the commonly known notions of "fog" or "mist".

Also, to assess the probability and veracity of a forecast, the method of assembly modeling is used. By running a whole set (or assembly) of models using slightly different initial conditions, we obtain a set of possible scenarios for a certain meteorological situation. These are terabytes of data that need to be statistically processed to obtain the resulting probabilities of these scenarios. For example, this is how forecasts of probable movement are made for tropical hurricanes.

By later comparing forecasts with the actual measurements made by meteorological stations, we can assess the forecast error and find the patterns leading to these errors. Then, by correcting these errors using methods of statistical processing and machine learning, we can further increase our forecasts' accuracy.

Also, methods of machine learning and deep learning are already actively used in modern meteorology. There are many examples of successfully applying neural networks to locally improve the accuracy of forecasts, as well as for nowcasting (short-term weather forecasting) and setting computational model parameters. Methods of machine learning allow us to effectively find and use non-linear relationships among sets of meteorological variables; however, a complete replacement of computational models with neural networks is not yet possible.

The more powerful the computer, the more resolution (area) of forecasting we can calculate in a given time. Moreover, a computer's computational power directly affects the whole system's speed, and in some cases, it might be critical to update the forecast as quickly as possible: for example, in the case of a developing tropical hurricane where literally every second of delay counts as the damage may be prohibitive.

Finally, more powerful computers enable us to use more computationally complex methods of data assimilation and processing, which ultimately increases our forecasts' accuracy.

Timur Garifov / Unsplash

Pavel: Machine learning has helped make our forecasts more accurate. Forecasts will become more and more accurate, concurrently with our progress in developing more powerful computers.

In meteorology, we also use distributed computing as an alternative to using supercomputers. This is employed in projects where one can give a part of their computing power to help with climate calculations. However, this is not as effective for weather forecasting as it is in other areas. The reason is that roughly speaking, a weather forecast for tomorrow must be completed today; the sooner, the better. So, one cannot use distributed computing to the full extent. It might be beneficial for the overall power, but the resulting computation speed is too low.

The atmosphere is so diverse that every resulting weather situation is just one of millions possible.

The future of meteorology is not about predicting the weather within the accuracy of fractions of a percent and doing it better than the day before (the forecasts are already quite accurate); it is about better predicting hurricanes and typhoons, squalls, heat waves, extreme precipitation (rain, snow...) — everything that harms people and damages economy of cities, regions, and countries.

With that, it is the one who learns to better predict such events and provide the data about them in a convenient way who will be the leader both in meteorology and in weather applications.

Text: Ilya Drido and Pavel Konstantinov of the Windy.app team

Cover photo: Alex Kotliarskyi / Unsplash