numpy - Correlation coefficients for sparse matrix in python? -
does know how compute correlation matrix large sparse matrix in python? basically, looking numpy.corrcoef work on scipy sparse matrix.
you can compute correlation coefficients straightforwardly covariance matrix this:
import numpy np scipy import sparse def sparse_corrcoef(a, b=none): if b not none: = sparse.vstack((a, b), format='csr') = a.astype(np.float64) # compute covariance matrix # (see http://stackoverflow.com/questions/16062804/) = - a.mean(1) norm = a.shape[1] - 1. c = a.dot(a.t.conjugate()) / norm # correlation coefficients given # c_{i,j} / sqrt(c_{i} * c_{j}) d = np.diag(c) coeffs = c / np.sqrt(np.outer(d, d)) return coeffs check works ok:
# smallish sparse random matrices = sparse.rand(100, 100000, density=0.1, format='csr') b = sparse.rand(100, 100000, density=0.1, format='csr') coeffs1 = sparse_corrcoef(a, b) coeffs2 = np.corrcoef(a.todense(), b.todense()) print(np.allclose(coeffs1, coeffs2)) # true be warned:
the amount of memory required computing covariance matrix c heavily dependent on sparsity structure of a (and b, if given). example, if a (m, n) matrix containing single column of non-zero values c (n, n) matrix containing all non-zero values. if n large bad news in terms of memory consumption.
Comments
Post a Comment