This report summarizes the work done in my GSoC 2019 project, Enhancement of Statistics Module wth SymPy. A step by step development of the project is available at czgdp1807.github.io.
I am a third year Bachelor of Technology student at Indian Institute of Technology, Jodhpur in the department of Computer Science and Engineering.
The project plan was focused on the following areas of statistics that were required to be added to
- Community Bonding - I was supposed to add, Dirichlet Distribution, Multivariate Ewens Distribution, Multinomial Distribution, Negative multinomial distribution, and Generalized multivariate log-gamma distribution to
- Phase 1 - I was supposed to work on stochastic processes, primraly on Markov chains, including it’s API design, algorithm and implementation.
- Phase 2 - I was expected to work on random matrices, including Gaussian ensembles and matrices with random expressions as their elements.
- Phase 3 - I planned to work on assumptions of dependence, improving result generation by
sympy.stats and improving other modules so that
sympy.stats can function properly.
This section describes the actual work done during the coding period in terms of merged PRs.
- Community Bonding
#16576: This PR added
#16808 : This PR added
#16810 : This PR improved the API of
Sum by allowing
Range as the limits.
#16825 : This PR in continuation, added
GeneralizedMultivariateLogGamma distribution. This was an interesting one due to the complexity involved in its PDF.
#16834 : This PR enhanced the
NegativeMultinomial distributions by allowing symbolic dimensions for them.
- Phase 1
#16897 : This was related to
sympy.core and it helped in removing disparity in the results of special function
#16908 : This PR improved
sympy.stats.frv by allowing conditions with foriegn symbols.
#16913 : This removed the unreachable code from
#16914 : This PR allowed symbolic dimensions to
#16929 : This one was for the
sympy.tensor module. It optimized the
ArrayComprehension and covered some corner cases.
#16981 : This PR added the architecture of stochastic processes. It also added discrete Markov chain to
#17030 : Some features like,
joint_dsitribution were added to stochastic processes in this PR.
#17046 : Some common properties of discrete Markov chains, like fundamental matrix, fixed row vector were added.
- Phase 2
#16934 : The bug fixes for
sympy.stats.joint_rv_types were complete and the further work has been handed over to my co-student, Ritesh.
#16962 : This was continuation of the work done in phase 1 for allowing symbolic dimensions in finite random variables. As I planned, this PR got merged in phase 2, after some changes.
#17083: The work done in this PR framed the platform and reason for the next one. The algorithm that got merged was a bit difficult to extend, and maintain. Thanks to Francesco for his comment for motivating me to re-think the whole framework.
#17163 : This was one of the most challenging PRs of the project, because, it involved re-designing the algorithm, refactoring the code and moreover lot of thinking. The details can be found at this comment.
- Phase 3
#17174 : In this PR, Gaussian ensembles were added to
#17304 : While working on the above PR, I got an idea to open this one to add cicular ensembles to
sympy.stats. I learned a lot about Haar measure while working.
#17306: This PR added matrices with random expressions. The challenging part of this PR was to generate canonical results for passing the tests.
#17336 : This was related to bug fix in
Matrix. Take a look at an example here.
This section contains some of my PRs related to miscellanous issues like, workflow improvement, etc.
The following PRs are open and are in their last stages for merging. Any interested student can take a look at them to extend my work in his/her GSoC project.
#17387 : This PR aims to add support for assumptions of dependence among random variables, like,
#17146 : This PR is in its last stages to fix and upgrade the
Range set and we are finalizing few things, like changes in the output of
Range. As planned I was successful at writing exhaustive and systematic tests.
Apart from the above, work on densities of Circular ensembles remains to be done. One can read the Theorem 3, page 8 of this paper.