Przejdź do treści

Newsletter Dane i Analizy, 2021-09-13

Cotygodniowa dawka linków, czyli archiwum newslettera Dane i Analizy

Still Using the OS Module in Python? This Alternative is Remarkably Better
Python’s OS module is a nightmare for managing files and folders. You should try Pathlib. File and folder management with Python’s os module is a nightmare. Yet, it’s an essential part of every data science workflow. Saving reports, reading configuration files, you name it — there’s no way around it. Picture this — you spend weeks building an API around your model, and it works flawlessly, at (…)

The Surprisingly Interesting Mathematics within Chutes and Ladders
I wrote a MATLAB program that simulates 100,000 Chutes and Ladders games. Here are the results. Image provided by the author Introduction: Chutes and Ladders is a game based on complete random chance. There is no strategy involved whatsoever. Your pawn movement completely relies on what number you spin on every turn. While this makes it a rather boring game for some, it provides those interested (…)

How to Create a Radar Chart in Python
Radar chart is a visualization technique used to compare multiple variables. This is a tutorial on how to create a radar chart in python. Radar chart , also called as Spider chart or Web chart is a graphical method used for comparing multiple quantitative variables. It is a two dimensional polar visualization. This is a tutorial on how to prepare a radar chart in python. Import Libraries We will (…)

9 Pandas value_counts() tricks to improve your data analysis
Pandas value_co u nts() function returns a Series containing counts of unique values. By default, the resulting Series is in descending order without any NA values. For example, let’s get counts for the column “ Embarked” from the Titanic dataset . >>> df[’Embarked’].value_counts() S 644 C 168 Q 77 Name: Embarked, dtype: int64 The series returned by value_count() is in descending order by (…)

Deep understanding of the ARIMA model
Hands-on Tutorials Explore the features of time series — stationarity, stability, autocorrelation Generally, a model for time-series forecasting can be written as Eq 0.2 Definition of the time-series forecasting model where yₜ is the variables to be forecasted ( dependent variable , or response variable), t i s the time at which the forecast is made, h is the forecast horizon, Xₜ is the variables (…)

Fast AutoML with FLAML + Ray Tune
Thoughts and Theory Microsoft Researchers have developed FLAML (Fast Lightweight AutoML) which can now utilize Ray Tune for distributed hyperparameter tuning to scale up FLAML’s resource-efficient & easily parallelizable algorithms across a cluster One of FLAML’s algorithms CFO tuning the # of leaves and the # of trees for XGBoost. The two heatmaps show the loss and cost distribution of all (…)

PixelCNN’s Blind Spot
Limitations of the PixelCNN and how to fix it! Written by Walter Hugo Lopez Pinaya , Pedro F. da Costa , and Jessica Dafflon Hi everybody! Today, we will continue the series about autoregressive models and we will focus on one of the biggest limitations of PixelCNNs (i.e., blind spots) and how to improve to fix it. Summary Autoregressive models — PixelCNN Modelling data with multiple channels (…)

GloVe Research Paper Explained
An Intuitive understanding and Explanation of math behind GloVe model Image by Ugur Akdemir from  Unsplash In continuation of my word2vectors research paper explained blog, I have taken up GloVe research paper [Pennington et al.] for explaining my understanding about a very detailed and comprehensive research paper. GloVe stands for Global Vectors where global refers to global statistics of (…)

Deploying Machine Learning Models Into A Website Using Flask
From model development to application … an interesting (and sometimes unpleasant) journey. After more than 3 years of studying critical concepts in the fields of Data Science, developing the necessary skills to implement, design and evaluate Machine Learning and Predictive models has become the norm. Yet, its applications continue to increase at an exponential rate as the demand for professionals (…)

A Beginner’s Guide to Image Processing With OpenCV and Python
This article was published as a part of the Data Science Blogathon Introduction We all know the phrase: “Every picture can tell us a story”. There could be a lot of information hidden inside an image and we could interpret it in different ways and perspectives. So, what is an image, and how to deal with […] The post A Beginner’s Guide to Image Processing With OpenCV and Python appeared first on (…)

Asynchronous Loading of Large Datasets in Tensorflow
This article was published as a part of the Data Science Blogathon Introduction There are many tutorials and video lectures on the Web, and other materials discussing the basic principles of building neural networks, their architecture, learning strategies, etc. Traditionally, neural networks are trained by presenting image packets from the training sample to the neural network […] The post (…)

A Gentle Introduction to Graph Neural Networks
This article is one of two Distill publications about graph neural networks. Take a look at Understanding Convolutions on Graphs to understand how convolutions over images generalize naturally to convolutions over graphs. Graphs are all around us; real world objects are often defined in terms of their connections to other things. A set of objects, and the connections between them, are naturally (…)

Why most A/B Tests are Not Efficient
Demystifying Pure Exploration (Part 0) A/B testing has many application areas such as drug testing. ( ) For several decades now, A/B testing has been a mainstay of statistics, becoming the bedrock upon which the entire edifice of controlled randomized testing , that most sacred of scientific corroboration techniques, has been built. Given the plethora of articles on the topic on this site, I (…)

A Step By Step Guide To AI Model Development
In 2019, Venturebeat reported that almost 87% of data science projects do not get into production. Redapt, an end-to-end technology solution provider, also reported a similar number of 90% ML models not making it to production. However, there has been an improvement. In 2020, enterprises realized the need for AI in their business. Due to…

Object detection with YOLOv3 With Tensorflow 2.0
This article was published as a part of the Data Science Blogathon Introduction In this tutorial, we will discuss the implementation of the YOLO Object Detection system in Tensorflow 2.0. YOLO is the latest object detection system (network). It was designed by Joseph Redmon. Models in the YOLO family are exceptionally fast and far outperform R-CNN […] The post Object detection with YOLOv3 With (…)

Bayesian Linear Regression in Python: Using Machine Learning to Predict Student Grades Part 2
Implementing a Model, Interpreting Results, and Making Predictions In Part One of this Bayesian Machine Learning project, we outlined our problem, performed a full exploratory data analysis, selected our features, and established benchmarks. Here we will implement Bayesian Linear Regression in Python to build a model. After we have trained our model, we will interpret the model parameters and use (…)

Deploy NLP model Using Flask. Machine learning model deployment with…
Natural Language Processing Machine learning model deployment with HTML static page This article will take you to the amazing journey of deployment with flask and static HTML page. The role of these two tools is very much important to make a web app for the data science and machine learning field. The machine learning field becomes more interesting after deploying the model to their own and (…)

Build Your First Discord Bot Using Python
This article was published as a part of the Data Science Blogathon Introduction: Hello Everyone, in this article, we shall be coding a bot for discord, using just python. Let us begin and jump into the process without further ado. A brief about Discord for those who don’t already know Discord is basically a one-stop voice […] The post Build Your First Discord Bot Using Python appeared first on (…)

Machine Learning & Cyber Security
Mamy jesień. Dzieci poszły do szkoły, a my zaczynamy kolejny odcinek podcastu Biznes Myśli.  Dzisiejszym gościem jest Mirosław Mamczur. Mirek już kiedyś był w odcinku, ale to była króciutka wypowiedź tuż po kursie, bo Mirek jest absolwentem kursu „Praktyczne uczenie maszynowe od podstaw”. To była pierwsza edycja. Dla mnie, dla DataWorkshop i dla Mirka był to pierwszy kurs, w którym wziął udział. (…)

PySpark Neural Network from Scratch
A simple tutorial to learn how to implement a Shallow Neural Network (3 fully connected layers) using PySpark. Foreword This article is not intended to provide mathematical explanations of neural networks, but only to explain how to apply the mathematical equations to run it using Spark (MapReduce) logic in Python. For simplicity, this implementation only uses RDDs (and no DataFrames). Similarly, (…)

Integrating Google Maps API using Python and JavaScript
A coding guide on how to incorporate the Google Maps API into a webpage using Python and JS. After downloading and saving this csv file in the base directory, the data is processed into a format that can be used in different domains: JSON. Let the file for doing the above task be named processdata.py . In this case, preprocessing simply includes: loading and reading the CSV file using pandas  and (…)

A Practical Introduction to Grid Search, Random Search, and Bayes Search
Hands-on tutorial to effectively use Hyperparameter tuning in Machine Learning In Machine Learning, hyperparameters refer to the parameters that cannot be learned from data and need to be provided before training. The performance of machine learning models relies heavily on finding the optimal set of hyperparameters. Hyperparameter tuning basically refers to tweaking the hyperparameters of the (…)

What does word2vec actually learn?
And how to train embeddings from similarity functions Representing discrete objects by continuous vectors, the so-called embeddings, has been at the heart of many successful machine learning solutions. The superiority comes from the fact that, unlike the original discrete objects, the embedding vectors offer a compact representation that captures the similarity between the original objects. In (…)

Simple GitHub Integration with VSCode
GitHub is a good way to share your code and keep it secure— VSCode is a great editor that works seamlessly with GitHub I am no GitHub expert. I don’t need to be. I don’t need to share code across development teams or to have multiple versions of code that will eventually be merged into a single product or do any of the other wonderful things you can do with GitHub. What I want is to have a copy (…)

how async/await works in Python
Mark functions as async . Call them with await . All of a sudden, your program becomes asynchronous – it can do useful things while it waits for other things, such as I/O operations, to complete. Code written in the async / await style looks like regular synchronous code but works very differently. To understand how it works, one should be familiar with many non-trivial concepts including (…)

5 Tips for Analysts (and their Managers)
I’ve worked professionally as an analyst for just over two years in large retail organisations. Depending on the company you work for, analysts are either seen as an asset or just a numbers grunt, and this pigeon holing in my experience is driven by the insecurity of some managers when it comes to being more data driven and not decisions based on their gut feeling, elsewise known as the HiPPO ( (…)

Running Timeseries Anomaly Detection at Scale on SQL Data
Multi-dimensional data, SQL, Pandas, and Prophet Illustration from  unDraw Time is probably the most important dimension for metrics. In the business world, business executives, analysts, and product managers track metrics over time. In the startup world, VCs want metrics to grow 5% week-on-week. In public stock markets, long-term investors evaluate metrics on a quarter-on-quarter basis to make (…)


Zestawienie linków przygotowuje automat, wybacz więc wszelkie dziwactwa ;-)

Dodaj komentarz

Twój adres e-mail nie zostanie opublikowany. Wymagane pola są oznaczone *