I’m looking at my second dead board. The first one worked for a couple days, and then stopped talking on meshtastic and bluetooth. The second one worked for a couple weeks, then dies the same way.
On both boards, I reflashed the boards using the process of putting the uf2 file in the USB drive created by the module. (firmware-rak4631-2.4.2.5b45303. Post flash the device does not show up on bluetooth or wifi, or meshtastic.
FWIW neither board was ever powered up without the antenna connected.
Battery voltage is 4.096 volts, and I have tried removing and re-applying power.
The situation you have is not really the first case. Even on Meshtastic discourse forum and discord, there are many concerns about Meshtastic nodes suddenly stop working and it commonly happen few days after outdoor deployment/installation.
It appears to me that there could be corruption on the flash of your module (although the bootloader section is still working since you can still drag a uf2 file).
What you can do is try to upload the Meshtastic_nRF52_factory_erase to hopefully clear any flash related issue (take note that you still need to open a terminal and hit enter to perform this erase and not only dragging the file).
Why this happen? There is corruption in flash that can happen during reset or when voltage supply is marginal/low. This is also the reason why it happens only in remote deployments and after few days.
This is the last update I can see on Meshtasitc github (issue link) and some possible fix on this issue. However, since it is likely to break some core function of the FW (as I understand it), the plan is to include it release of Meshtastic V3.
I am very doubtful that the power is going low enough to cause those issues, but given lightning in the area, an EMP caused reset is rather likely. I don’t have the schematics or I’d diagnose that. I suspect the reset circuit impedance is too high (high pullup resistor, small cap) for this sort of use.
But this begs the question, why would anyone store volatile data like this in non-volatile memory anyway. Flash/EE/whatever has a finite number of write cycles anyway, in addition to all the power and timing constraints you noted. A volatile message log belongs in RAM. External serial RAM is cheap, and if a message gets scrambled, or if the whole thing gets scrambled, who cares?
This can be a good discussion with the Meshtastic developers. They are active in github/discord on this topic but I am not fully update.
If you think you have stable supply, I agree that maybe transient electrical pulse/noise can introduce the reset. As for the solution, you can reenforce it via hardware (lower COG capacitor in reset pin, earthed metal enclosure, etc.) but I think the fix should be the software. Like if the module resets 1000x, there should be no lockup or any flahs related issue that can happen.