Machine learning is part of artificial intelligence in the field of computer science that often uses statistical techniques to give the computer the ability to "learn" (that is, progressively improve performance on a particular task) with data, without being explicitly programmed.
The name machine learning was created in 1959 by Arthur Samuel. Evolving from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on such data-algorithms overcome strict static program instructions by making data-based predictions or decisions, by making model of the sample input. Machine learning is used in a variety of computing tasks where designing and programming explicit algorithms with good performance is difficult or improper; examples of applications include email filtering, network intruder detection or malicious insiders working towards data breaches, optical character recognition (OCR), learning to rank, and computer vision.
Machine learning is closely related to (and often overlap with) computational statistics, which also focuses on making predictions through the use of computers. It has a strong bond with mathematical optimization, which provides methods, theories and application domains to the field. Machine learning is sometimes combined with data mining, where the last sub-field focuses more on exploratory data analysis and is known as unattended learning. Machine learning can also be unattended and used to study and establish basic behavior profiles for various entities and then be used to find meaningful anomalies.
In the field of data analytics, machine learning is a method used to design complex models and algorithms that enable prediction; in commercial use, this is known as predictive analysis. This analytical model allows researchers, data scientists, engineers, and analysts to "produce reliable and repeatable results and decisions" and uncover "hidden insights" through learning from historical relationships and trends in data.
Video Machine learning
Ikhtisar
Tom M. Mitchell provides a broader and more formal definition of the algorithms studied in the field of machine learning: "Computer programs are said to learn from experience E in relation to several task classes T and the performance gauge P if its performance on tasks at T , as measured by P , increases with experience E . "The definition of a task in which the machine learning concerned offers a fundamental operational definition rather than defining a field in cognitive terms. It follows Alan Turing's proposal in his paper "Computing Machinery and Intelligence", in which the question "Can the machine think?" replaced with the question "Can the machine do what we (as thinking entities) can do?". In Turing's proposal, various characteristics can be possessed by the thinking machine and the implications of building one exposed.
Machine learning task
Machine learning tasks are usually classified into two broad categories, depending on whether there is a learning "signal" or "feedback" available to the learning system:
- Controlled learning: Computers are presented with the desired input and output samples, given by "teachers," and the goal is to learn the general rules that map inputs to output. As a special case, input signals can only be partially available, or restricted to special feedback:
- Semi-supervised learning: computers are only given incomplete training signals: a set of training with several (often many) of lost target output.
- Active learning: computers can only get training labels for a limited set of instances (based on budget), and must also optimize their object selection to get the label. When used interactively, this can be presented to users for labels.
- Reinforcement learning: training data (in the form of rewards and punishments) is only given as feedback on program actions in a dynamic environment, such as driving a vehicle or playing games against an opponent.
- Unattended learning: There is no label given to the learning algorithm, leaving it alone to find the structure in its input. Unattended learning can be an end in itself (finding hidden patterns in the data) or a means to the end (feature learning).
Machine learning app
Another categorization of the machine learning task comes when one considers the desired output output of the machine-studied system:
- In the classification, the input is divided into two or more classes, and the learner must produce a model that provides invisible input to one or more (multi-label classification) of these classes. This is usually handled in a supervised manner. Spam filtering is an example of classification, where inputs are email (or other) messages and classes are "spam" and "not spam".
- In the regression, also the problem being supervised, the output is continuous rather than discrete.
- In grouping, a set of inputs is divided into groups. Unlike in the classification, previously unknown groups, making this is usually an unattended task.
- The estimation density finds the input distribution in some spaces.
- Dimension reduction simplifies the input by mapping it into a lower dimension space. Topical modeling is a related issue, in which a program is given a list of human language documents and is assigned to find out which documents cover similar topics.
Among other categories of machine learning problems, learning to learn to learn inductive bias on the basis of previous experience. Developmental learning, outlined for robotic learning, generates its own sequence (also called the curriculum) of learning situations to cumulatively acquire new skills repertoire through autonomous self-exploration and social interaction with human teachers and using counseling mechanisms such as active learning, maturation, motor synergy, and imitation.
Maps Machine learning
History and relationships to other fields
Arthur Samuel, an American pioneer in computer games and artificial intelligence, coined the term "Machine Learning" in 1959 while at IBM. As a scientific endeavor, machine learning grew from the search for artificial intelligence. Already in the early days of AI as an academic discipline, some researchers were interested in having machine learning from data. They try to approach the problem with a variety of symbolic methods, as well as what is then called "neural networks"; these were mostly perceptrons and other models that were later found to be reinventions of general statistical linear models. Probabilistic reasoning is also used, especially in automatic medical diagnosis.
However, increasing emphasis on logical, knowledge-based approaches causes a rift between AI and machine learning. The probabilistic system is plagued by the theoretical and practical problems of data acquisition and representation. By 1980, expert systems had dominated AI, and statistics were not favored. Working on symbolic/knowledge-based learning continues in AI, leading to inductive logic programming, but a more statistical line of research is now beyond the proper field of AI, in pattern recognition and information retrieval. Neural network research has been abandoned by AI and computer science at the same time. This line, too, continued outside the AI ââ/CS field, as "connectionism", by researchers from other disciplines including Hopfield, Rumelhart and Hinton. Their major success came in the mid-1980s with the re-creation of propaganda.
Machine learning, reorganized as a separate field, began to flourish in the 1990s. This field changes its purpose from achieving artificial intelligence to solve practical, practical problems. This shifts the focus of the symbolic approach inherited from the AI, and to the methods and models borrowed from statistics and probability theory. It also benefits from the increasing availability of digital information, and the ability to distribute it over the Internet.
Machine learning and data mining often use the same method and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from training data, data mining focuses on the discovery (previously) unknown property in the data (this is a step of knowledge discovery analysis in the database). Data mining uses many machine learning methods, but with different purposes; on the other hand, machine learning also uses data mining methods as "unattended learning" or as a preprocessing step to improve student accuracy. Much of the confusion between these two research communities (which often have separate conferences and separate journals, ECML PKDD as the main exception) stems from their basic assumption of working with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge , while in the discovery of knowledge and data mining (KDD), the main task is to find previously unknown knowledge . Evaluated with respect to known knowledge, un-informed methods will be easily overridden by other supervised methods, whereas in a typical KDD task, supervised methods can not be used because of unavailability of training data.
Machine learning also has an intimate relationship with optimization: many learning problems are formulated as minimizing some loss functions on a set of training examples. The loss function represents a mismatch between the predicted model being trained and the actual problem example (for example, in the classification, one wants to assign a label to an instance, and the model is trained to predict the preceded label correctly from a set of instances). The difference between the two fields arises from the generalization goal: while the optimization algorithm can minimize the loss on a set of training, machine learning is concerned with minimizing losses on the unseen sample.
Relationship with statistics
Machine learning and statistics are closely related fields. According to Michael I. Jordan, machine learning ideas, from methodological principles to theoretical tools, have had a long history in statistics. He also suggested the term data science as a placeholder to call the field as a whole.
Leo Breiman distinguishes two paradigms of statistical modeling: data models and algorithmic models, where "algorithmic algorithm" means more or less machine-learning algorithms such as random Forests.
Some statisticians have adopted a method of machine learning, leading to a composite field they call learning statistics .
Theory
The primary goal of a learner is to generalize from his experience. Generalization in this context is the ability of the learning machine to perform accurately on new/invisible examples after experiencing a series of learning data. The training examples come from some unknown probability distribution (considered to represent the event space) and the learner should build a general model of this space that enables him to produce fairly accurate predictions in new cases.
Computational analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory. Because the training set is limited and the future is uncertain, learning theories usually do not result in performance guarantees of the algorithm. Conversely, the probabilistic constraints on performance are fairly common. Decomposition of bias-variance is one way to measure generalization errors.
For best performance in the generalization context, the complexity of the hypothesis must be in accordance with the complexity of the underlying data function. If the hypothesis is less complex than the function, then the model has adjusted the data. If model complexity increases in response, training errors will decrease. But if the hypothesis is too complicated, then the model should overfit and generalizations will get worse.
In addition to performance limits, computational learning theorists study the complexity of time and learning feasibility. In computational learning theory, calculations are considered feasible if they can be done in polynomial time. There are two kinds of time complexity. Positive results indicate that a particular function class can be studied in polynomial time. Negative results indicate that certain classes can not be studied in polynomial time.
Approach
Learning decision tree
The learning decision tree uses decision trees as a predictive model, which maps observations about the items to conclusions about the target item values.
Learning association rules
Learning association rules is a method for finding an interesting relationship between variables in large databases.
Artificial neural networks
The neural network learning algorithm (JST), commonly called "neural network" (NN), is a learning algorithm that is vaguely inspired by biological neural networks. Computation is organized on the basis of a group of interconnected artificial neurons, processing information using a connectionist approach to computation. Modern neural networks are non-linear statistical data modeling tools. They are usually used to model complex relationships between inputs and outputs, to find patterns in data, or to capture statistical structures in an unknown mixed probability distribution between observed variables.
In-depth learning
The fall in hardware prices and GPU development for personal use in recent years has contributed to the development of an in-depth learning concept consisting of several hidden layers in artificial neural networks. This approach tries to model the way a human brain processes light and sound into vision and hearing. Some of the most successful applications in learning are computer vision and speech recognition.
Inductive logic programming
Inductive logic programming (ILP) is an approach to learning rules using logic programming as a uniform representation for input samples, background knowledge, and hypotheses. Given the encoding of known background knowledge and a set of examples represented as logical fact data bases, the ILP system will derive a hypothesized logic program involving all positive and non-negative examples. Inductive programming is a related field that considers any type of programming language to represent hypotheses (and not just logic programming), such as functional programs.
Supports vector engine
Vector engine support (SVM) is a set of related supervised learning methods used for classification and regression. With a set of training examples, each marked as one of two categories, the SVM training algorithm builds a model that predicts whether a new instance falls into one category or other category.
Clustering
Cluster analysis is the assignment of a set of observations into subsets (called clusters ) so that observations in the same group are similar according to some predetermined criteria or criteria, while observations taken from different groups. Different grouping techniques make different assumptions on the data structure, often defined by some similarity metrics and evaluated for example by internal compactness (similarities between members of the same cluster) and > separation among the various groups. Other methods are based on density estimates and graph connectivity . Clustering is an unattended learning method, and a common technique for the analysis of statistical data.
Bayesian Bayesian network
Bayesian networks, trust networks or graphic models of directed acyclic are probabilistic graphical models that represent a set of random variables and their conditional independence through directed acyclic graphs (DAG). For example, the Bayesian network may represent a probabilistic relationship between disease and symptoms. Symptoms are given, the tissue can be used to calculate the probability of the presence of various diseases. Efficient algorithms have inference and learning.
Reinforcement learning
Reinforcement learning is concerned with how an agent should take action in the environment so as to maximize some of the long-term gift ideas The strengthening learning algorithm seeks to discover the policies that map the world of the action agents should take in those countries. Reinforcement learning differs from monitored learning problems in correct input/output pairs never presented, or sub-optimal actions are explicitly corrected.
Representation learning
Some learning algorithms, mostly non-supervised learning algorithms, aim to find a better representation of the input given during the training. Classic examples include major component analysis and cluster analysis. Representational learning algorithms often try to retain information in their input but alter it in a way that makes it useful, often as a pre-processing step before classification or prediction, allowing the reconstruction of inputs derived from the unknown data generation distribution, while not always faithful to the configuration which does not make sense under that distribution.
Manifold learning algorithm tries to do so under the limitation that the representations studied are low-dimensional. The coding algorithm rarely tries to do so under the constraint that the studied representation is rare (has many zeros). The multilinear section space learning algorithm aims to study the direct dimensional representation directly from the tensor representation for multidimensional data, without reshaping them into vectors (high dimensions). In-depth learning algorithms find different levels of representation, or feature hierarchy, with higher, more abstract features defined in terms of (or yielding) lower level features. It has been argued that intelligent machines are machines that study representations that describe the underlying factors that explain the variations in the observed data.
Similarities and learning metrics
In this case, the learning machine is given a pair of examples that are considered similar and pairs of less-similar objects. Then it is necessary to study the similarity function (or distance metric function) that can predict whether the new object is similar. Sometimes used in Recommended systems.
Learning dictionary rarely
In this method, a datum is represented as a linear combination of base functions, and coefficients are assumed to be sparse. Let x be d -dimension datum, D to d by the n matrix, where each column D represents a base function. r is the coefficient to represent x using D . Mathematically, the learning of a rare dictionary means solving where r is very rare. In general, n is assumed to be greater than d to allow freedom for a rare representation.
Studying the dictionary along with rare representations is very hard and also difficult to solve. The popular heuristic method for rare dictionary learning is K-SVD.
Simple dictionary learning has been applied in several contexts. In classification, the problem is to determine which classes were previously invisible datums. Suppose a dictionary for each class has been built. Then the new datum is associated with the class in such a way that the best is rarely represented by the appropriate dictionary. Simple dictionary learning has also been applied in de-noising images. The main idea is that the clean image patch can be rarely represented by the image dictionary, but the sound can not.
Genetic algorithm
The genetic algorithm (GA) is a heuristic search that mimics the process of natural selection, and uses methods such as mutations and crossovers to generate new genotypes in the hope of finding a good solution to a given problem. In machine learning, the genetic algorithm discovered several uses in the 1980s and 1990s. In contrast, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.
Machine-based learning
Rule-based machine learning is a generic term for any machine learning method that identifies, studies, or develops a "rule" for storing, manipulating or applying, knowledge. The decisive characteristic of a rule-based learning machine is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other learner machines that generally identify a single model that can be universally applied to each instance to make predictions. Rule-based machine learning approaches include learning classification systems, association rule learning, and artificial immune systems.
Learning classifier system
Learning classifier systems (LCS) are a family of rules-based machine learning algorithms that combine discovery components (eg genetic algorithms) with learning components (carrying out supervised learning, strengthening learning, or unattended learning). They seek to identify a set of rules that depend on the context that collectively store and apply knowledge in a disaggregated way to make predictions.
Apps
Applications for machine learning include:
In 2006, Netflix online film company held the first "Netflix Prize" competition to find programs to better predict user preferences and improve accuracy on existing Cinematch movie recommendation algorithms of at least 10%. The joint team consisting of researchers from AT & amp; T Labs-Research in collaboration with Big Chaos and Pragmatic Theory teams built an ensemble model to win the Grand Prize in 2009 for $ 1 million. Shortly after the prize was awarded, Netflix realized that audience ratings were not the best indicator of their viewing pattern ("all are recommendations") and they changed their recommendation engine accordingly.
In 2010 The Wall Street Journal wrote about Rebellion Research and their use of Machine Learning to predict the financial crisis.
In 2012, one of the founders of Sun Microsystems Vinod Khosla estimates that 80% of physician jobs will disappear within the next two decades for automatic medical diagnostic learning software.
By 2014, it has been reported that machine learning algorithms have been applied in Art History to study art paintings, and that may have revealed previously unknown effects among artists.
Model rating
Although machine learning has been highly transformative in some areas, effective machine learning is difficult because it finds difficult patterns and is often insufficient training data available; consequently, machine-learning programs often fail to transmit.
Machine classification learning models can be validated with accuracy estimation techniques such as the Holdout method, which divides the data into training and test sets (conventional 2/3 set of training and 1/3 set test set) and evaluates the performance of the training model on the test set. In comparison, the k -fold-cross-validation method randomly divides the data into k the parts where k Ã,-1 examples from data subsets are used to train the model while k th section example is used to test the prediction ability of the training model. In addition to the holdout and cross-validation methods, bootstrap, which exemplifies n with replacement of the dataset, can be used to assess model accuracy.
In addition to the overall accuracy, researchers often report sensitivity and specificity which means True Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, researchers sometimes report False Positive Rate (FPR) as well as False Negative Rate (FNR). However, this level is a ratio that fails to reveal the numerator and its denominator. Total Operational Characteristics (TOC) is an effective method for expressing a model's diagnostic capabilities. TOC denotes the numerator and denominator of the aforementioned tariffs, thus TOC provides more information than the commonly used Receiver Operating Characteristic (ROC) and ROC's associated Area Under the Curve (AUC).
Ethics
Machine learning raises a number of ethical questions. Systems trained on datasets collected with bias can show this bias when used (algorithmic bias), thus digitizing cultural prejudices. For example, using employment recruitment data from firms with a racist recruitment policy may cause the machine learning system to duplicate biases by assessing job applicants against similarities with previous successful applicants. The collection of data and documentation responsible for algorithmic rules used by a system is an essential part of machine learning.
Because language contains bias, machines that are trained in the language corpora will also learn bias.
Another form of ethical challenge, unrelated to personal bias, is more visible in health care. There are concerns among health care professionals that this system may not be designed for the public good, but as revenue-generating machines. This is especially true in the United States where there is an eternal ethical dilemma in improving health care, but also boosting profits. For example, an algorithm can be designed to provide patients with unnecessary tests or medicines in which the proprietary owner of the algorithm holds shares in it. There is great potential for machine learning in health care to provide great tool professionals to diagnose, treat, and even plan. recovery pathway for patients, but this will not happen until the aforementioned personal bias, and this "greedy" bias is discussed.
Software
Software suite that contains various machine learning algorithms including the following:
Free and open source software
Exclusive software with free and open source edition
Proprietary software
Journals
- Learning Machine Study Journal
- Machine Learning
- Neural Calculation
Conference
- Conference on the Neural Information Processing System
- International Conference on Machine Learning
- International Conference on Representation of Learning
- Open Data Science Conference (ODSC)
See also
References
Further reading
External links
- International Machine Learning Society
- Popular online course by Andrew Ng, at Coursera. It uses GNU Octave. This course is a free version of Stanford University's actual program taught by Ng, whose lectures are also available for free.
- mloss is the academic open source software academic data base.
- Google's Machine Learning Destruction Course. This is a free course on machine learning through the use of TensorFlow
- A popular book on Machine Intelligence in Design Automation.
- Draft copy of Andrew Ng's book.
- An online copy of an in-depth study by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
Source of the article : Wikipedia