Nxt Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

Latest Nxt Client 1.11.2 - Announcement for Ardor (Nxt 2.0) is here. Ardor Tokens have been released!

Pages: [1]

Author Topic: Message pruning options  (Read 1244 times)

verymuchso

  • Hero Member
  • *****
  • Offline Offline
  • Posts: 547
    • View Profile
    • HEAT Ledger
  • Karma: +118/-2
Message pruning options
April 23, 2015, 07:32:46 am

My previous posts on the subject have been overlooked so i'll try and bundle my questions/proposals here.
Relevant questions post 1, post 2.


Option 1.

As I understand it message pruning is done based on how long the prunable data has been on the blockchain, the pruning is performed based on the message timestamp, I understand that this could be coded very efficiently. But has it been considered to prune data not on timestamp but to keep a record of how much prunable data there is, how much meaning the total size of all prunable data?
When scanning the chain a running total could be kept of the total size of of all prunable data and when the max amount of total storage is reached the oldest data could be pruned, why and for who this matters i'll explain further down.

Nodes would set the max amount of prunable data to keep in their config, normal nodes could for instance set a maximum of 500MB while special stronger nodes could set the total to for instance 10GB (H2 database advertises it can handle upto 4TB of data - more than enough for most use cases).

Option 2.

When thinking of the "special" nodes that will keep a larger amount of prunable data on disk these nodes will probably belong to businesses that have a special type of data they want to keep track of. Projects like nxtty (messaging), freemarket or perhaps a torrent tracker, blogging platform etc.. come to mind here.
This made me think of allowing these technology providers to store only the data they need for their business so they'll need a filtering mechanism to only store that relevant data. The way to filter the data could for instance be based on the first bytes of each bit of prunable data or for instance if the data comes from a certain account or even more flexible would be to allow a plugin style system where you would write a jar, put it on the classpath and reference it in the config (nxt.pruneFilter=com.foo.business.PruneFilter for instance).

Why do it this way you ask? It's to lower the barrier of entry. Tech providers could of course run their own database systems and put all the data they want to keep track of in those databases but what if NXT provides this out of the box, simply stick a NXT node on a server, set the config to your project specifics and you'll have your backend working instantly. For starting projects this could prove very valuable.

Option 3.

Allow big (42KB) messages for the price of 1 NXT fee. This of course would go at a cost of network bandwidth but not of the blockchain. If all normal nodes simply ignore the prunable part and only store the message hash (since they think the fee is too low) then only the nodes who consider the data important (which they decide based on the filter mentioned in option 2.) would store the prunable data.

Please let me know your thoughts.
HEAT | lompsa.comFIMK hosted wallet | Realtime NXT activity | NXT+

verymuchso

  • Hero Member
  • *****
  • Offline Offline
  • Posts: 547
    • View Profile
    • HEAT Ledger
  • Karma: +118/-2
Re: Message pruning options
April 23, 2015, 08:21:31 pm

Have been looking somemore into using H2 as a production database, topic seems to be very opinionated.
Guess it really depends on the type of project.

Main memory requirements:
The larger the database, the more main memory is required. With the current storage mechanism (the page store), the minimum main memory required is around 1 MB for each 8 GB database file size.

Database file size limit: 4 TB (using the default page size of 2 KB) or higher (when using a larger page size).
This limit is including CLOB and BLOB data.

Not at all sure how that performs, if at all??
But I guess that comes down to requiring only some 500MB memory to run a 4TB H2 database.

------------

Please let me know your thoughts on the op..
HEAT | lompsa.comFIMK hosted wallet | Realtime NXT activity | NXT+

ScripterRon

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 401
    • View Profile
  • Karma: +70/-2
Re: Message pruning options
April 24, 2015, 01:40:45 pm

I modified NRS last year to replace H2 with PostgreSQL.  The performance was worse because PostgreSQL ran in a separate address space and used TCP/IP to connect to NRS (this is on Windows, it might be faster on Linux using cross-memory sockets).  PostgreSQL offers a number of improvements for large databases since it can spread the database across multiple disks for concurrent access.  Although I don't know if that matters that much with SSD disks.  And PostgreSQL requires a database administrator to set things up.

I've also used NRS with the MVSTORE storage mechanism in H2 1.4.  This provides performance improvements and real-time database compacting.  But it is still in beta.

It might be feasible to design a database plugin capability for NRS.  PostgreSQL is pretty much compatible with H2, although there are some datatype differences and some of the triggers are handled differently.
NXT-XM86-4ZNA-65L5-CDWUE

verymuchso

  • Hero Member
  • *****
  • Offline Offline
  • Posts: 547
    • View Profile
    • HEAT Ledger
  • Karma: +118/-2
Re: Message pruning options
April 26, 2015, 02:35:46 pm

Yes postgress and H2 seem to almost be compatible out the box. Also interesting that you tried the MVSTORE storage, I've seen that and have wondered how that would perform.
I tried running NRS on mysql some time ago, but had to stop halfway because of other work that came up.
It did look like it could be done just by changing the H2 specific field types into their mysql counterparts.
But now with the addition of lucene i'm not sure anymore it would be so easy as it was before.

-----------

Still wondering about the question in the op.
The arguments/options seem valid and could be of use to many companies building on NXT.
HEAT | lompsa.comFIMK hosted wallet | Realtime NXT activity | NXT+

Riker

  • Core Dev
  • Hero Member
  • *****
  • Online Online
  • Posts: 1443
    • View Profile
  • Karma: +380/-42
Re: Message pruning options
April 26, 2015, 09:10:54 pm

I think that your ideas are valid, the challenges in defining more complex rules are usually:
1. popping off blocks to move to a better chain.
2. Protecting the network against various attacks.
Therefore the approach of keeping it simple and being conservative with message size and fees make sense until real life use cases emerge that require more complexity.

Regarding adding server side plugins such as the pruning filter you suggested, the challenges here are:
1. Configuration complexity, since users would have to change the server classpath
2. Potential for instability in case users write bad code in their filters (enter infinite loop, allocate too much memory, call System.exit() etc)
3. Incompatible versions of plugin and the NRS code.
4. Potential for security issues of malicious users spreading plugins to steal information of cause DDOS attack.

NXT Core Dev
Account: NXT-HBFW-X8TE-WXPW-DZFAG
Public Key: D8311651 Key fingerprint: 0560 443B 035C EE08 0EC0  D2DD 275E 94A7 D831 1651

Jean-Luc

  • Core Dev
  • Hero Member
  • *****
  • Offline Offline
  • Posts: 1553
    • View Profile
  • Karma: +785/-81
Re: Message pruning options
April 27, 2015, 09:26:18 am

As I understand it message pruning is done based on how long the prunable data has been on the blockchain, the pruning is performed based on the message timestamp, I understand that this could be coded very efficiently. But has it been considered to prune data not on timestamp but to keep a record of how much prunable data there is, how much meaning the total size of all prunable data?
When scanning the chain a running total could be kept of the total size of of all prunable data and when the max amount of total storage is reached the oldest data could be pruned, why and for who this matters i'll explain further down.

Nodes would set the max amount of prunable data to keep in their config, normal nodes could for instance set a maximum of 500MB while special stronger nodes could set the total to for instance 10GB (H2 database advertises it can handle upto 4TB of data - more than enough for most use cases).
The minimum lifetime of prunable data must be the same for all nodes, to ensure that a new node that is downloading the blockchain from scratch, or is a few days/weeks behind, will get the latest data that hasn't yet expired, regardless of which peer it is currently feeding from. Otherwise it gets complicated, if the minimum expiration time configured on your node is longer than on the peer you are downloading from, you can't get that data and need to look for another peer that has it, therefore the blockchain download algorithm would need to be modified, which is not justified.

Quote
When thinking of the "special" nodes that will keep a larger amount of prunable data on disk these nodes will probably belong to businesses that have a special type of data they want to keep track of. Projects like nxtty (messaging), freemarket or perhaps a torrent tracker, blogging platform etc.. come to mind here.
This made me think of allowing these technology providers to store only the data they need for their business so they'll need a filtering mechanism to only store that relevant data. The way to filter the data could for instance be based on the first bytes of each bit of prunable data or for instance if the data comes from a certain account or even more flexible would be to allow a plugin style system where you would write a jar, put it on the classpath and reference it in the config (nxt.pruneFilter=com.foo.business.PruneFilter for instance).
The maximum lifetime could indeed be made more flexible, although it still needs to have a dependency on timestamp, once the total size limit has been reached to decide which data to start deleting first. But this is of interest to those special archival nodes only. Whether such configurability should be in the default client, or have a framework that allows a node to use a custom pruning filter plugin, well, we will decide once we get there. Let's first get enough users using the prunable messages and tagged data features, and also depending on the use pattern we get, we can focus on what type of prune filtering will be most needed. The prunable_message table has sender and recipient columns, filtering on those will not be difficult, and the tagged_data also has fields like type and isText that can be used to filter off e.g. torrents or binary data.

Quote
Allow big (42KB) messages for the price of 1 NXT fee. This of course would go at a cost of network bandwidth but not of the blockchain. If all normal nodes simply ignore the prunable part and only store the message hash (since they think the fee is too low) then only the nodes who consider the data important (which they decide based on the filter mentioned in option 2.) would store the prunable data.
If your suggestion is to have prunable data with no minimum lifetime requirement at all, i.e. some nodes can decide to prune it immediately on reception and never keep it, then we again run into the problem that it cannot be guaranteed that such data will propagate to the nodes that need it, and when feeding from a node that does immediate pruning, you would need to switch to another one if you actually need the data.
Instead of fiddling with the blockchain download algorithm and make some peers special, the approach to allow that should be to be able to download the prunable data later, with a separate request, from a node that advertizes it that it keeps the data longer. It can be done, just not in this release, we need additions to the peer networking protocol to allow sharing information about which peer keeps which data, and an API to request it.

GPG key fingerprint: 263A 9EB0 29CF C77A 3D06  FD13 811D 6940 E1E4 240C
xmpp: jeanlucpicard@jabber.ccc.de EAFA3A2E 33B21A52 370CE6D4 35A4B325 3ED22061
NXT-X4LF-9A4G-WN9Z-2R322

verymuchso

  • Hero Member
  • *****
  • Offline Offline
  • Posts: 547
    • View Profile
    • HEAT Ledger
  • Karma: +118/-2
Re: Message pruning options
April 29, 2015, 04:48:38 pm

I think that your ideas are valid, the challenges in defining more complex rules are usually:
1. popping off blocks to move to a better chain.
2. Protecting the network against various attacks.
Therefore the approach of keeping it simple and being conservative with message size and fees make sense until real life use cases emerge that require more complexity.

Regarding adding server side plugins such as the pruning filter you suggested, the challenges here are:
1. Configuration complexity, since users would have to change the server classpath
2. Potential for instability in case users write bad code in their filters (enter infinite loop, allocate too much memory, call System.exit() etc)
3. Incompatible versions of plugin and the NRS code.
4. Potential for security issues of malicious users spreading plugins to steal information of cause DDOS attack.

Nicely put. Thank you for your feedback.
All of this is of course targeted at the bigger "institutional" users for whom these difficulties would not likely be a problem.
You are right to start and be conservative but it also is a small step for a larger entity to fork the code and go their own path.
Perhaps catering pro-actively to their needs could make the decision to either fork or work on the NXT chain one which would be more in favor of NXT.
HEAT | lompsa.comFIMK hosted wallet | Realtime NXT activity | NXT+

verymuchso

  • Hero Member
  • *****
  • Offline Offline
  • Posts: 547
    • View Profile
    • HEAT Ledger
  • Karma: +118/-2
Re: Message pruning options
April 29, 2015, 05:01:57 pm

The minimum lifetime of prunable data must be the same for all nodes, to ensure that a new node that is downloading the blockchain from scratch, or is a few days/weeks behind, will get the latest data that hasn't yet expired, regardless of which peer it is currently feeding from. Otherwise it gets complicated, if the minimum expiration time configured on your node is longer than on the peer you are downloading from, you can't get that data and need to look for another peer that has it, therefore the blockchain download algorithm would need to be modified, which is not justified.

In reply to your remarks:

I see now that a minimum amount of time the data needs to be on the chain is indeed a must to make sure data would propagate correctly through the network.
Especially for nodes that come online and need to download the chain. I did seem to forget about that.
But with a correct propagation mechanism in place this would not be a problem. I'll continue my thinking on the topic.

Thank you for your feedback
Dirk

HEAT | lompsa.comFIMK hosted wallet | Realtime NXT activity | NXT+

kunibopl

  • Jr. Member
  • **
  • Offline Offline
  • Posts: 63
    • View Profile
  • Karma: +9/-3
Re: Message pruning options
August 30, 2016, 09:10:04 am

can someone tell me how getPrunableMessages&account works exactly.
In the API wiki I don't find this info. Where would I have to look in the code?
I noticed, that only messages of a certain length qualify for prunable messages.
You can see this, when you compare
http://localhost:7876/nxt?requestType=getPrunableMessages&account=NXT-YXC4-RB92-F6MQ-2ZRA6
with
https://mynxt.info/account/NXT-YXC4-RB92-F6MQ-2ZRA6
the first link gives only transactions with a certain message length.

I need to display all prunable messages independent of their length.
Pages: [1]