September 2024
I'm thrilled to have been awarded 2nd prize in the SMAC Challenge. Here's a bit more about the journey.
The Seismic Monitoring and Analysis Challenge has been proposed as part of ECML-PKDD 2024. The goal was to classify satellite images to determine whether the depicted region had been affected by an earthquake and, if so, to estimate its magnitude. The satellite data, collected using SENTINEL-1 radar technology, captures vertically polarized signals and their reflections in both vertical and horizontal planes. These reflections reveal crucial information about ground deformation over time.
The challenge, based on the QuakeSet dataset available in TorchGeo, involved two key tasks:
An additional leaderboard criterion was the number of Floating-Point Operations (FLOPs) required for inference. This encouraged competitors to design models that were not only accurate but also efficient and scalable.
Initially, I struggled to achieve good results using complex neural network architectures, including convolutional models, transformers, encoders,... I thought, maybe, examining the samples could offer valuable insights into how to approach these tasks.
And then...
Many of the unaffected samples looked identical to the naked eye. And indeed, most showed no discernible difference between pixels. Why? The satellite images cover areas of about 20km², meaning each pixel represents roughly 39m² of terrain. For visible changes to occur on this scale, a significant event would be necessary. This made the classification task fairly straightforward: by checking if the images remained the same, I could easily label most unaffected regions. For the remaining samples, I used LightGBM, an efficient gradient boosting method.
For the regression task, I also relied on LightGBM for the non-identical samples. Interestingly, the winning solution by Giorgio Morales was even simpler: he used the mean magnitude of events from the training data, and this minimized the error metric. Sometimes, the simplest solution is the best.
This competition reinforced an important lesson: simple strategies often outperform complex ones. Many problems can be solved with basic approaches, and deep learning should not be the default choice. Also, examining data carefully before diving into a problem isn't just about gathering statistics and metrics; it's about genuinely understanding the data and leveraging domain expertise. Machine learning is a tool, not a silver bullet, and it should be used in conjunction with the vast knowledge built by experts over decades, or even centuries.
All of this is not a critique, but a reminder: I should always prioritize simplicity, efficiency, and explainability before opting for more elaborate and complex strategies.
Winning the second place has awarded me the co-authorship of the challenge paper, and I have had the chance to present my solution during the ECML-PKDD 2024.
The complete code is available on GitHub. It comes in the forms of Jupyter Notebooks with some more extensive information about the methods used here. A complete report is available here.