In one of my recent projects I had to group data from a large sparse matrix. This was mainly to speed up the model fitting process.
The story in short: I couldn’t find a decent solution since most at some point converted the sparse matrix into a dense form, to group over. This is OK for a small matrix, but not for those that explode into gigabytes in their dense form…
So I wrote a function to exploit the sparse triplet structure to efficiently group a sparse matrix. Here it is with explanation.