I'm elbows deep in a refactor of my project. It relies on django, postgresql, a bunch of math/science/statistics libraries, interfaces with discord, and collects data from a 3rd party API. All running old school style on a debian vps.
Every part of this was due for a refresh, as I'd version pinned all the key parts.
This required refactor had been started and burned out on a few times, but with a deadline on the introduction of this new 3rd party API. The current production system scrapes from a different data source. The schema has dramatically changed.
So I'm working towards what I refer to as Version 4 of the project.
Because of this I'm day dreaming about what I'd do if I was going to do a complete rewrite rather than this refactor.
I'm dumping these intrusive thoughts out here so I can focus on the immediate task to hand and return to these notes when I've got V4 out the door, all bedded in and "done".
1. 3rd party data requests should be cached
Compress the "seldom used" json files, leave the test/dev data raw. This way if required the entire database can be rebuilt from this collection of data rather than having to make all the API calls again.
2. output generated in response to a user request should be cached
For some percentage (subquest: calculate the ballpark) of content that is generated in response to user requests won't change over time. All of these should returned cached output.
3. split more things out into single function process, feed these from a job queue
Currently my system has a few large "apps" that do a complete process from start to finish. This slows down testing and debugging.
4. raw SQL queries as much as possible
Use a framework and it's ORM where it makes sense (model definition, migrations, external API endpoints, user settings, etc), but for the core data store/retrieve; just do SQL. Obviously with a nice connection/cursor -> data structure wrapper.
5. a fast and efficient test harness is crucial
Many of the points above also combine into improving testing. For example, once the outputs of user commands are cached; it's simple to replay a command and compare outputs as a simple automated regression test process. Similarly, splitting out the functions of the current large "self contained" processes makes testing far simpler.
6. automate monitoring of key "steps" in the flow from user to the system and back to user
Set up appropriate calls to action when these have errors.
7. monitor the incoming data from the 3rd party API for schema changes
Gracefully deal with as many edge cases you can think of.
So yeah, tl;dr is: I want to use Elixir and Phoenix so much for Version 5 :)