Plenty of articles describe this hello world of Machine Learning. I will merely list some references and personal notes – primarily for my own convenience.
The objective is: get a first hands on exposure to machine learning – using a well known example (Iris classification) and using commonly used technology (Python). After this first step, a second step seems logical: doing the same thing with my own set of data.
Useful Resources:
- To set up an isolated environment in which to work with Python and friends: How to Create a Linux Virtual Machine For Machine Learning Development With Python 3 (Jason Brownlee) – http://machinelearningmastery.com/linux-virtual-machine-machine-learning-development-python-3/
- To work through a well known example of machine learning using Python: Your First Machine Learning Project in Python Step-By-Step (Jason Brownlee) – http://machinelearningmastery.com/machine-learning-in-python-step-by-step/
- Machine Learning Notebook – example of step by step data analysis pipeline on Iris data set: https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects/blob/master/example-data-science-notebook/Example%20Machine%20Learning%20Notebook.ipynb
Starting time: 6.55 AM
6.55 AM Download and install latest version of Oracle Virtual Box (5.1.22)
7.00 AM Download Fedora 64-bit ISO image (https://getfedora.org/en/workstation/download/)
7.21 AM Create Fedora VM and install Fedora Linux on it from ISO image (create users root/root and python/python); reboot, complete installation, run dnf update (updates worth 850 MB, 1348 upgrade actions – I regret this step), install Virtual Box Guest Addition (non trivial) using this article: https://fedoramagazine.org/install-fedora-virtualbox-guest/.
8.44 AM Save a Snapshot of the VM to retain its fresh, mint, new car smell condition.
8.45 AM Install Python environment for Machine Learning (Python plus relevant libraries; possibly install Notebook server)
8.55 AM Save another snapshot of the VM in its current state
now the environment has been prepared, it is time for the real action – based on the second article in the list of resources.
10.05 AM start on machine learning notebook sample – working through Iris classification
10.15 AM done with sample; that was quick. And pretty impressive.
It seems the Anaconda distribution of Python may be valuable to use. I have downloaded and installed: https://www.continuum.io/downloads .
Note: to make the contents of a shared Host Directory available to all users
cd (go to home directory of current user)
mkdir share (in the home directory of the user)
sudo mount -t vboxsf Downloads ~/share/ (this makes the shared folder called Downloads in Virtual Box Host available as directory share in guest (Fedora)
Let’s see about this thing with Jupyter Notebooks (fka as IPython). Installing the Jupyter notebook is discussed here: https://github.com/rasbt/python-machine-learning-book/blob/master/code/ch01/README.md . Since I installed Anaconda (4.3.1 for Python 3.6) I have the Jupyter app installed already.
With the following command, I download a number of notebooks:
git clone https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects
Let’s try to run one.
cd /home/python/Data-Analysis-and-Machine-Learning-Projects/example-data-science-notebook
jupyter notebook ‘Example Machine Learning Notebook.ipynb’
And the notebook opens in my browser:
I can run the notebook, walk through it step by step, edit the notebook’s contents and run the changed steps. Hey mum, I’m a Data Scientist!
Oh, it’s 11.55 AM right now.
Some further interesting reads to get going with Python, Pandas and Jupyter Notebooks – and with data:
- 10 minute intro into Pandas – data analysis library for Python – http://pandas.pydata.org/pandas-docs/stable/10min.html
- Tutorials to get going with Pandas – http://pandas.pydata.org/pandas-docs/stable/tutorials.html
- Processing JSON with Python and analyzing the data as well ( Including map and heatmap) – https://www.dataquest.io/blog/python-json-tutorial/