As I understand it message pruning is done based on how long the prunable data has been on the blockchain, the pruning is performed based on the message timestamp, I understand that this could be coded very efficiently. But has it been considered to prune data not on timestamp but to keep a record of how much prunable data there is, how much meaning the total size of all prunable data?
When scanning the chain a running total could be kept of the total size of of all prunable data and when the max amount of total storage is reached the oldest data could be pruned, why and for who this matters i'll explain further down.
Nodes would set the max amount of prunable data to keep in their config, normal nodes could for instance set a maximum of 500MB while special stronger nodes could set the total to for instance 10GB (H2 database advertises it can handle upto 4TB of data - more than enough for most use cases).
The minimum lifetime of prunable data must be the same for all nodes, to ensure that a new node that is downloading the blockchain from scratch, or is a few days/weeks behind, will get the latest data that hasn't yet expired, regardless of which peer it is currently feeding from. Otherwise it gets complicated, if the minimum expiration time configured on your node is longer than on the peer you are downloading from, you can't get that data and need to look for another peer that has it, therefore the blockchain download algorithm would need to be modified, which is not justified.
When thinking of the "special" nodes that will keep a larger amount of prunable data on disk these nodes will probably belong to businesses that have a special type of data they want to keep track of. Projects like nxtty (messaging), freemarket or perhaps a torrent tracker, blogging platform etc.. come to mind here.
This made me think of allowing these technology providers to store only the data they need for their business so they'll need a filtering mechanism to only store that relevant data. The way to filter the data could for instance be based on the first bytes of each bit of prunable data or for instance if the data comes from a certain account or even more flexible would be to allow a plugin style system where you would write a jar, put it on the classpath and reference it in the config (nxt.pruneFilter=com.foo.business.PruneFilter for instance).
The maximum lifetime could indeed be made more flexible, although it still needs to have a dependency on timestamp, once the total size limit has been reached to decide which data to start deleting first. But this is of interest to those special archival nodes only. Whether such configurability should be in the default client, or have a framework that allows a node to use a custom pruning filter plugin, well, we will decide once we get there. Let's first get enough users using the prunable messages and tagged data features, and also depending on the use pattern we get, we can focus on what type of prune filtering will be most needed. The prunable_message table has sender and recipient columns, filtering on those will not be difficult, and the tagged_data also has fields like type and isText that can be used to filter off e.g. torrents or binary data.
Allow big (42KB) messages for the price of 1 NXT fee. This of course would go at a cost of network bandwidth but not of the blockchain. If all normal nodes simply ignore the prunable part and only store the message hash (since they think the fee is too low) then only the nodes who consider the data important (which they decide based on the filter mentioned in option 2.) would store the prunable data.
If your suggestion is to have prunable data with no minimum lifetime requirement at all, i.e. some nodes can decide to prune it immediately on reception and never keep it, then we again run into the problem that it cannot be guaranteed that such data will propagate to the nodes that need it, and when feeding from a node that does immediate pruning, you would need to switch to another one if you actually need the data.
Instead of fiddling with the blockchain download algorithm and make some peers special, the approach to allow that should be to be able to download the prunable data later, with a separate request, from a node that advertizes it that it keeps the data longer. It can be done, just not in this release, we need additions to the peer networking protocol to allow sharing information about which peer keeps which data, and an API to request it.