numpy - Correlation coefficients for sparse matrix in python? -


does know how compute correlation matrix large sparse matrix in python? basically, looking numpy.corrcoef work on scipy sparse matrix.

you can compute correlation coefficients straightforwardly covariance matrix this:

import numpy np scipy import sparse  def sparse_corrcoef(a, b=none):      if b not none:         = sparse.vstack((a, b), format='csr')      = a.astype(np.float64)      # compute covariance matrix     # (see http://stackoverflow.com/questions/16062804/)     = - a.mean(1)     norm = a.shape[1] - 1.     c = a.dot(a.t.conjugate()) / norm      # correlation coefficients given     # c_{i,j} / sqrt(c_{i} * c_{j})     d = np.diag(c)     coeffs = c / np.sqrt(np.outer(d, d))      return coeffs 

check works ok:

# smallish sparse random matrices = sparse.rand(100, 100000, density=0.1, format='csr') b = sparse.rand(100, 100000, density=0.1, format='csr')  coeffs1 = sparse_corrcoef(a, b) coeffs2 = np.corrcoef(a.todense(), b.todense())  print(np.allclose(coeffs1, coeffs2)) # true 

be warned:

the amount of memory required computing covariance matrix c heavily dependent on sparsity structure of a (and b, if given). example, if a (m, n) matrix containing single column of non-zero values c (n, n) matrix containing all non-zero values. if n large bad news in terms of memory consumption.


Comments

Popular posts from this blog

c++ - CryptStringToBinary API behavior -

c++ - Correct method for redrawing a layered window -

java.util.scanner - How to read and add only numbers to array from a text file -