Cotygodniowa dawka linków, czyli archiwum newslettera Dane i Analizy
MLOps with Docker and Jenkins: Automating Machine Learning Pipelines
How to containerize your ML models with Docker and automate a Pipeline with Jenkins
Performing Deduplication with Record Linkage and Supervised Learning
Identifying duplicate records with a machine-learning approach Introduction Most data are recorded manually by humans and most often is not reviewed, not synchronized, and simply because there were mistakes made such as typos. Think for a second, have you ever filled out the same form twice before but with a slight difference in your address? For example, you submitted a form like this (…)
A Comprehensive Guide on Databricks for Beginners
This article was published as a part of the Data Science Blogathon Overview Databricks in simple terms is a data warehousing, machine learning web-based platform developed by the creators of Spark. But Databricks is much more than that. It’s a one-stop product for all data needs, from data storage, analysis data and derives insights using SparkSQL, […] The post A Comprehensive Guide on Databricks (…)
Apache Spark Performance Optimization for Data Engineers
This article was published as a part of the Data Science Blogathon Introduction Apache Spark is a big data processing framework that has long become one of the most popular and frequently encountered in all kinds of projects related to Big Data. It successfully combines the speed of work and the simplicity of the developer expressing […] The post Apache Spark Performance Optimization for Data (…)
20+ VS Code Shortcuts I Can’t Live Without
Not the fanciest shortcuts, but the ones that I heavily use every day I crave productivity when it comes to software development, and VS Code has a lot to offer. Out of the 753 actions in keybindings.json, I’d like to share with you the ones that I heavily use. These aren’t the fanciest ones, but I really can’t live without them. 1. Command Palette Every action you can do in VS Code is here. This (…)
How To Access And Query Your Google BigQuery Data Using Python And R
Overview In this post, we see how to load Google BigQuery data using Python and R, followed by querying the data to get useful insights. We leverage the Google Cloud BigQuery library for connecting BigQuery Python, and the bigrquery library is used to do the same with R. We also look into the two steps of manipulating the BigQuery data using Python/R: Connecting to Google BigQuery and (…)
HTTPS for Developers – DEV Community
This article lives in: Dev.to Medium GitHub The FastAPI docs Intro Here’s a brief introduction to HTTPS for developers . 🔒 This article is extracted from the FastAPI docs about HTTPS . I just upgraded those docs with several explanations and diagrams, and I thought the end result is generic and useful enough for many other developers (even in other languages and frameworks ) to also publish it as (…)
Predicting Wine Prices with Hyperparameter Tuning
What is Hyperparameter Tuning? Many popular machine learning libraries use the concept of hyperparameters. These can be though of as configuration settings or controls for your machine learning model. While many parameters are learned or solved for during the fitting of your model (think regression coefficients), some inputs require a data scientist to specify values up front. These are the (…)
Tutorial: FastAPI Playground
Let’s CRUD some speedsters 🏃⚡ 🏃⚡ 🏃⚡ First of all, we need to read the settings from the .env file, to set the database connection. Inside the lightning package (aka our FastAPI application), create the file config.py with the code: # ~/starlabs/lightning/config.py from pydantic import BaseSettings class DBSettings(BaseSettings): username: str password: str database: str host: str port: str class (…)
26 Useful Python Snippets for Lazy Developers
Here are some of my most useful code snippets that will indefinitely make your life easier as a programmer!
Apache Kafka in Python: How to Stream Data With Producers and Consumers
Stream chat data by writing Kafka Producer and Consumer from scratch. In a world of big data, a reliable streaming platform is a must. Apache Kafka is the way to go. Today’s article will show you how to work with Kafka Producers and Consumers in Python. You should have Zookeeper and Kafka configured through Docker. If that’s not the case, read this article or watch this video before proceeding. (…)
The Ultimate Face-off: Flask vs. FastAPI
Choosing a framework is not easy, and that’s why I’m here to help you get rid of the headache. Why should we even compare Flask and FastAPI ? They are similar. Both are stripped-down Python microframeworks without the bloated bells and whistles, which means faster development time and more flexibility. Also, both are used for building APIs and web applications. They are also different. Flask is (…)
How to Update Excel Files Using Python
Learn the basics of openpyxl package and create scripts to improve your daily job. Introduction openpyxl is a Python library to allow you to read and write Excel files using an easy code, thus enabling people to improve work performance. When I say improve performance, allow me to explain why: I have 13+ years in the IT industry and I may say that many of those were spent behind many Excel sheets (…)
Zestawienie linków przygotowuje automat, wybacz więc wszelkie dziwactwa ;-)