We will install Docker, Docker Compose and Dockstation on Fedora 34 using official guidelines and set up a pyspark-notebook image as an example.

So, let’s get started right away.

First switch to the root user:

su root

Alternatively, you can add sudo before each following command.

Install Docker Engine on Fedora

Uninstall old versions

dnf remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-selinux \
docker-engine-selinux \

Set up the repository

dnf -y install dnf-plugins-corednf config-manager \
--add-repo \

Install Docker Engine

dnf install docker-ce docker-ce-cli containerd.io

Start Docker

systemctl start docker

Verify that Docker…

Scikit-learn is a Python module for machine learning that provides a lot of regression, classification, and clustering algorithms.

We will run several classification models and compare their accuracy score on data from Taarifa and the Tanzanian Ministry of Water.

Full code is available on GitHub.

First, let’s import required libraries

import pickle
import plotly.express as px
from scipy import stats
import xgboost as xgb
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from imblearn.over_sampling import SMOTE, ADASYN
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import SimpleImputer, IterativeImputer
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.compose import ColumnTransformer
from sklearn.neural_network import MLPClassifier
from sklearn.svm…

Regression models are widely used as statistical technique for prediction the outcome based on observed data.

Linear regressions allows describe how dependent variable (outcome) changes relatively to independent variable(s) (feature, predictor).

When there is one independent variable and one dependent, it is called simple linear regression (SLR).
When there is more than one independent variable and one dependent, it is called multiple linear regression (MLR).

A simple linear regression equation looks like:

y = a + bx

x — the independent (explanatory) variable,
y — the dependent (responce) variable,
a — intercept,
b — slope of the line (coefficient).

Data science is an interdisciplinary field which focuses on making inferences from large data sets. This field includes data cleaning, manipulation, analysis, visualization and presentation of findings in order to inform a high-level decisions in an organization. As such, it incorporates skills from computer science, mathematics, statistics, information visualization, graphic design, and business.

Big data quickly become a vital tool for business everywhere from Amazon’s sales to TV and drug discovery. Data scientists are responsible for breaking down big data into usable information that guides decision making. The impact of big data in our days can not be over estimated…

Vadym Byesyedin

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store