Algorithms
In general, algorithms are a set of rules to be followed in calculations. In the area of data science, key algorithms are machine learning, which recognizes data patterns and predict future patterns. The algorithm needed is based on the data that is required.
Big Data
Big data happens when a data set from different sources becomes very large and keeps growing, usually quickly. It is comprised of the three V’s:
- Volume – the data is coming from a variety of sources.
- Velocity – the data comes in quickly and almost constantly.
- Variety – the data could be text, video, audio, transactions, comments, emails, and more.
This data can then be used to make decisions, depending on the industry. In healthcare, big data can be used to examine illnesses and give a warning if there’s a chance there could be an outbreak of disease. Schools use big data to gather information on thousands of students.
Business Intelligence (BI)
The process of taking business data and using it to develop strategies for the future. In data science, BI can take large amounts of data and offer suggestions based on what is mined.
Coding Bootcamp
An immersive class that teaches a specific type of coding in a few months. In addition to teaching high-potential students, bootcamps have proven successful to help current software engineers keep up-to-date on the latest technologies. Top coding bootcamps like Hack Reactor see their graduates command a higher salary than average software engineers.
Confounders
Variables that have an influence on other variables, leading to a possible incorrect interpretation of results. For instance, let’s say that a store is using data to analyze its sale of sandals. They may extract data that shows an increase in sales when the sandals are priced at a certain point. Confounders can be warm weather or placement in the store. The price may be a factor, but so can other variables. Confounders are unavoidable in some cases, but analysis can be designed to avoid them or reduce their impact on results.
Data Analyst
A person who takes raw data and transforms it into information that can be used by others for making decisions. Data analysis requires that data be inspected, cleaned, and molded to be of use. Think of a sculptor chipping away all the parts that aren’t an elephant in order to give you the elephant. Analysts are skilled in advanced Excel, SQL, Python, and digital marketing-related analytics
Data Engineering
Those on the leading edge of data engineering are not only analyzing data, but are increasingly implementing solutions that change how businesses are run—in software by machines. Recommendation engines, fraud detection, real-time pricing, and bidding—the list of possibilities is endless. These solutions often combine data from multiple sources and must run in near-real-time at scale supporting thousands, if not millions of simultaneous users—all while remaining secure and protecting individual privacy.
Data Mining
Not the extraction of data, but the process of discovering patterns in large quantities of data. Much like the real-life process of mining diamonds or gold from the earth, the most important task in data mining is to extract non-trivial nuggets from large amounts of data.
Data Science Bootcamp
A data science bootcamp uses Python-based curriculum, real-world case studies, and machine learning concepts to prepare qualified students for a career as a Data Scientist. Top-ranked data science bootcamps, like Galvanize, have created a network of top-tier data scientists who work everywhere from early-stage startups to some of the most prestigious tech companies. Successful graduates can expect an average base salary of $97,875.
Data Science
Data science at its simplest is taking data, structured or unstructured, and getting any information you want or need from it. This can be done using algorithms, methods, processes, or systems. Data science is better thought of as a broad field with numerous subfields. Explore more here.
Data Scientist
These are the people who figure out how to extract all the data companies and/or organizations need and want to help make their business better. They create the algorithms and methods to mine the data, find the patterns in all of it, and interpret it for possible solutions or recommendations.
Data Visualization
The visual portion of data that makes it easier to understand. Visualization can be flow charts, diagrams, or any visual representation that makes it easier to digest the process of the data being extracted and what it means. Data visualizations can demystify even the most complex data sets, using an easy-to-digest format to reveal important trends, influential factors, and patterns of behavior.
Database
A collection of data that can be easily accessed and managed. Databases come in a variety of types:
- Cloud – a database that has been designed to work in a virtual environment.
- Distributed – portions of the database are stored in various physical locations, such as multiple computers.
- Graph – a noSQL database that uses graph theory to organize and find relationships in data.
- NoSQL – designed to handle large volumes of data that cannot be handled by relational databases, such as unstructured data.
- Object-oriented – organized based on objects.
- Relational – comprised of a set of tables with a predefined category, which is easily expandable from the original. These are the most common and include databases like SQL.
Deep Learning
A subset of machine learning that uses artificial neural networks for machine learning. Deep learning sends data through different layers designed to improve the machine’s patterns.
Graph Theory
Data that is structured in nodes, which hold information, and edges, which are connections to the nodes.
Linear Regression
A supervised machine learning algorithm that finds the best linear relationship between two variables, one dependent and one independent. Linear regression has been around since 1911. It’s a great way to immerse yourself in traditional statistics because it’s been so extensively analyzed and used by statisticians.
Machine Learning
An application of artificial intelligence where the machine can learn from data and improve without the need for a specific set of instructions.
Multivariate Regression
Similar to linear regression, but this algorithm finds the linear relationship between multiple independent variables and a dependent variable.
Neural Network Modeling
Algorithms that are designed like the human brain that can recognize patterns in data.
Optimization Algorithms
Useful in deep learning, allows a user to update model parameters and minimize the value of the loss function.
Python
The most popular programming languages in data science for a variety of reasons – it’s easier to learn, easier to read, has extensive libraries, and has functionality that works well for data science. With its easy-to-understand syntax and powerful data analysis capabilities, Python can be applied to a wide variety of fields.
Random Forest Modeling
An algorithm that builds a large collection of decision trees that operate together (the forest) that can be used for classification and regression problems. Random forest works on the theory that the whole is greater than the parts.
Regression
Examines the relationship between the dependent and independent variables and how this affects the outcome. The purpose of regression analysis is to predict an outcome based on historical data.
Software Engineering
Software engineering is the development of applications, systems, and programs for computers, according to the Bureau of Labor Statistics. More deeply, software engineering is designing, constructing, modifying, and testing applications based on user needs. Engineering is used for large and complex software systems where simple programming would not work.
Supervised Learning
A data scientist provides a machine with a set of answers so the machine can learn what data is needed.
Structured Query Language (SQL)
A query language used to request information from a database, as well as insert, update and modify the data. According to studies, SQL is the most in-demand coding language in the United States.
Unsupervised Learning
A data scientist does not provide the machine with a set of answers and allows it to go through the data itself and produce results.
Vectors
Used to store objects or collections of objects in an organized manner. The size can be increased or decreased based on need.

0 comments on “Data Science Definitions”