Python has been the most popular language for machine learning. As compared to C, C++ and even Java, the syntax is simpler in Python. Python also consists of a lot of code libraries which make it easier to use for many purposes, including machine learning.
One such free software machine learning library in Python is the Scikit-learn (aka sklearn). This library features various classifications, regressions and cluster algorithms, including support vector machines, random forests, gradient boosting, k-means and DBSCAN. Scikit-learn is designer to interoperate with Python numerical and scientific libraries – NumPy and SciPy.
Recently, a new update was rolled out for Scikit-learn 0.23. This update has many amazing features apart from the usual bug fixes and improvements. Let’s take a look at these five features:
- Visual representation of estimators in Notebooks
- Improvements to K-means
- Improvements to gradient boosting
- New generalized linear models
- Sample weight support for existing regressors
Visual representation of estimators in Notebooks
When using Jupyter notebooks, the global display = ‘diagram’ can be enabled using Scikit-learn’s set_config() module. This can help provide a visual summarization of the structures of both pipelines and composite estimators that have been employed in the Notebooks. As a result, the diagrams obtained would be interactive, with sections like the pipelines, transformers, etc. that can be expanded.
Improvement to K-means
In this new update, the implementation of k-means has been revamped, making the process faster and more stable. The OpenMP parallelism has also been adopted in the new update. With this, joblib-reliant n_job training parameters have been done away with. Also, with the new update, the Elkan algorithm supports sparse matrices.
Improvements to gradient boosting
With this new update, both – HistGradientBoostingClassifier & HistGradientBoostingRegressorhave received multiple improvements. Support is now available for early stopping, and it is enabled by default for datasets that have more than 10,000 samples in it. The new update has also enabled support for monotonic constraints. Due to this, predictions can be constrained on the basis of specific parameters.
New generalized linear models
In this new update, three new regressors with non-normal loss functions have been added. Long-awaited generalized linear models with non-normal loss functions have now been made available on scitkit-learn 0.23. There are three particular regressors that have been implemented – PoissonRegressor, GammaRegressor and TweedieRegressor.
Sample weight support for existing regressors
Apart from the new regressors, the new update also offers support for sample weighting in a pair of existing regressors – Lasso and ElasticNet. These are easy to implement, as a parameter to the regressor instantiation. For this, an input of an array equivalent to the number of samples in the dataset is required.
Its been about thirteen years since the inception of Scikit-learn, and the world of machine learning has evolved significantly in this time. Scikit-learn still remains a very popular choice for using canonical machine learning techniques, especially for applications in experimental science and in data science. Today, Scikit-learn is an essential requirement for defining the API framework for the development of inter-operable machine learning components external to the core library.
To know more about important machine learning libraries in Python, like Scikit-learn, you can enroll for our Machine learning with Python training and certification course. This course discusses the concepts of Python language like file operations, sequences, object-oriented concepts, etc. along with some of the most commonly leveraged Python libraries. The course introduces participants to the concepts of machine learning with real-world scenarios and projects. We regularly update our curriculum to make sure all the latest concepts are included in the course material. To know more about machine learning course, reach out to us today!