How to do? New knowledge brought by machines we cannot understand

Netease Technology News April 21, foreign technology website Backchannel wrote that our machine now has our incomprehensible knowledge. We are increasingly relying on machines that can create their own models to reach conclusions, but those models often go beyond human understanding and will â€œthinkâ€ the world in a different way than we do.

The following is the main content of the article:

â€œThe availability of massive data, and the statistical tools used to analyze these data, bring a whole new way of understanding the world. Interrelationships replace causation, and even without a consistent model, unified theory, or mechanistic explanations, science can achieve progress."

Chris Anderson, former editor-in-chief of Wired, wrote in 2008. At that time, it triggered a fierce debate. For example, an article published in the Journal of Molecular Biology asked, "... if we stop looking for models and hypotheses, then what we do is science? The answer should obviously be 'no.'"

But now - less than 10 years apart from Anderson's article - the controversy sounds weird. With the help of our new and powerful networking hardware, advances in computer software are making it possible for computers not only to operate without models (rules that express how system elements interact with each other), but also to generate their own models, albeit those The model may not look like human creation. With various technology companies "taking machine learning first", this is even becoming a standard approach.

We are increasingly relying on machines that can create their own models to reach conclusions, but those models often go beyond human understanding and will â€œthinkâ€ the world in a different way than we do.

But this also brought the price. Such citation of alien intelligence is causing us to question the long-established hypotheses that have been implanted in Western traditions. We originally thought that knowledge was about finding order in chaos. We used to think that knowledge is about simplifying the world. It seems that we seem to be wrong. Recognizing this world may require us to give up to understand it.

Models beyond human understanding

In a series of articles on machine learning, Adam Geitgey explains the basics and can see this new type of â€œthinkingâ€:

"There are generic algorithms that can tell you where a set of data is interesting, and you don't have to write any custom code for the problem. You inject data into the generic algorithm instead of writing code, and then it builds itself from that data. The logic."

For example, you provide the machine learning system with thousands of scribbled hand-written "8" scans, and then it learns to identify "8" in the new scan. Instead of recognizing the rules we know (such as "8 is stacked by two circles above and below"), it does this by looking for complex dark pixel patterns presented in a digital matrix - this is for humans. The task is arduous. There has recently been an example in agriculture: the same digital modelling technique allows computers to understand how to classify cucumbers.

Next, you can take machine learning a step further by creating an artificial neural network that simulates the process of signal processing in the human brain. The nodes of the irregular network will be turned on or off according to the data from the nodes they are connected to; those connections have different weights, so some have the probability of opening neighboring nodes higher than others. Although artificial intelligence networks can be traced back to the 1950s, they are now forming a climate thanks to advances in computational performance, storage, and numerical computing. This increasingly complex branch of computer science may lead to the result that after so many layers of neural networks, deep learning draws a large number of results based on so many different variables under so many different conditions that humans cannot understand computers. Build your own model.

But this model is feasible. Google's AlphaGo project is such a defeat of the world's third-rank Go master. It's much harder to play Go in the machine design program than to let it go to classify cucumbers. After all, the potential number of Go changes reaches 10 to the 350th power; the potential number of changes in chess reaches 10 to the 123th power, and the universe has 10 to 80. Power atoms. Google's hardware configuration is not particularly alarming: it has only 48 processors, plus eight graphics processors, which are just enough to perform the required operations.

AlphaGo accepted the training that took place in the human chess player's 60,000 chess 30 million moves. During this period, it needs to pay attention to the moves taken by the human players, as well as understanding what should be considered as compliance and other basic rules. By using deep learning techniques to define patterns identified by the neural network layer, the system trains itself to understand how to win the highest probability.

Although AlphaGo has proven himself to be a world-class player, it does not say the actual principle that allows human players to learn. The program does not operate by developing general rules of gameplay, but by analyzing the probability of success under a particular game. In contrast, Deep Blue, IBMâ€™s chess player, has included some general principles for good moves. As Christof Koch stated in an article published in Scientific American, AlphaGo's intelligence relies on the billions of connections between its simulated neurons. It creates a model that allows it to make decisions, but that model is extremely complex and conditional. The result of its massive contingency plan is nothing but a victory over mankind.

Therefore, if you want to use your trivial brain to understand why AlphaGo chose a particular move, its â€œinterpretationâ€ is likely to involve a weighted connected network. Those connections pass their results to the next layer of neural networks. Your brain cannot remember all those weights, and even if it remembers it, it cannot perform the operation that leads to the next state of the neural network. Even if it can be done, you don't know how to go with Go, or you don't know how AlphaGo is going to Go - it just internalizes the operating principle of the human player's nerve state, and it doesn't help you understand why he's specific A move.

Go is just a game, so we can't keep up with AlphaGo's decision path. But if it is said that neural networks will allow us to analyze the gene interactions in the double-gene disease? If it is said that the neural network is used to distinguish the decay modes of single particles and multiple particles in the Large Hadron Collider? Is it said that using machine learning to help identify which of the 20 climate change models tracked by the Intergovernmental Panel on Climate Change is the most accurate? This kind of machine can bring us very good results - for example: "Congratulations! You just discovered the Higgs boson!" - but we can not keep up with their "logical reasoning."

Obviously, our computer has surpassed us in the ability to identify, discover patterns, and draw conclusions. That's one of the reasons we use them. Now we can let our computer make the model as large as needed, without narrowing the phenomenon to cater to relatively simple models. But this also seems to mean that what we know depends on the output of the machine. However, the specific operation of those machines cannot be followed, explained and incomprehensible.

Since our earliest use of sticks to dent, we have always used the things that exist in the world to help us understand the world. But we have never relied on things that do not fit the pattern of human logical reasoningâ€”we know what each dent representsâ€”and we cannot follow up to understand how our unperceived partners got those answers. If knowledge is always meant to be able to explain and confirm our true beliefs - the concept proposed by Plato, an ancient Greek philosopher, has been over 2,000 years old - then how do we understand a new type of knowledge? To know that this knowledge is not only difficult to explain the confirmation, but can not explain the confirmation.

Two famous models

In 1943, the US Army Corps of Engineers sent Italian and German prisoners of war to create the largest proportion of the model in history: 200 acres, indicating that 41% of the United States had access to the land of the Mississippi. By 1949, it was used as a simulation experiment to determine what would happen in a town if the river flooded from that point or point. The simulation test is believed to help prevent the flooding of Omaha City in 1952. Without it, floods could cause the city to suffer economic losses of 65 million U.S. dollars. In fact, some people even said that those simulation experiments are more accurate than the existing digital models.

Water is also an important part of another famous physical model: the economic simulation of MONIAC â€‹â€‹(Monetary National Income Simulator) created by New Zealand economist Alban William Housego Phillips in 1949 Device. MONIAC â€‹â€‹uses colored water in transparent pipes to simulate the impact of Keynesian economic policies. It is not as reliable as the Mississippi Simulator, probably because it does not take into account all the variables that affect the state of the economy. But the flow of water through a river like the Mississippi River is also affected by many variables that humans cannot list. How does the Mississippi model come to a prediction that is almost the same as the reality?

If you want to predict what will happen when you place boulders on the edge of the jet, you don't have to understand every aspect of fluid dynamics: you only need to build a scale model that puts small rocks into a small stream. As long as the model proportion does not matter, your model will give you the answer. As Stanford Gibson, Senior Hydraulic Engineer, said about the Mississippi River Basin Project, "The physical model will simulate all the processes on its own."

MONIAC â€‹â€‹uses water currents to simulate economic theory. â€œThe storage tank represents the export and import sectors in households, businesses, governments, and economiesâ€, representing income, expenditure, and GDP (gross domestic product). The variables it considers are limited by valves, ducts, and reservoirs that can fit into a refrigerator-sized device.

The Mississippi River Basin model does not appear to make assumptions about the factors that affect flooding, except that it is assumed that floods will not occur unless you inject more water into the system. But of course, that is not the truth. The model assumes that what happens with full size will also happen at 1/2000 size. In fact, the model has a horizontal ratio of 1/2000 and a vertical ratio of 1/100. This design "makes sure that the terrain changes are obvious." This leads to a disproportionate Rocky Mountains, 50 feet above the ground. The builder of the model assumes that the height of the mountain will not affect the results of its experiments, which is clearly correct. Similarly, they did not simulate the location of the moon, nor did they plant miniature crops in the fields because they assumed that those factors were irrelevant.

Therefore, the "non-theoretical" model of the Mississippi River is not only feasible because "the physical model will simulate all processes on its own", but also because the physical model incorporates assumptions about important factors that are provided for the purpose of the model's construction. Accurate results. Using the Mississippi model to simulate the effects of climate change or the effect of paddle propellers on algae growth will not produce reliable results, as those effects may be affected by other factors that are not in the model because those effects are proportional to the model. very sensitive.

Even in the case where the Mississippi model works, we do not understand why it works and how it works. It is not based on the mathematical model of the Mississippi River basin. It does not produce such a model will work. Indeed, it works because it does not require us to understand it: it allows the physical phenomena of simulation to evolve on its own without the need to impose restrictions on human logical reasoning. Therefore, this model is more accurate than a model based on human theory and understanding such as MONIAC.

Until the emergence of machine learning, we had no choice but to manually design the model and let the computer implement it. We assume that the way to improve predictive power is to make the model more specific and accurate while accumulating more and better data for those hand-made models. Because those models are the product of the human brain, knowledge and understanding can be closely related.

In fact, that assumption is based on an unexpressed assumption.

Assumption of Assumption

At the Galileo Museum in Florence, there is a celestial globe dating back to 1953 and it is particularly large in the room. It consists of many heavy metal and gold-plated wooden gears. The gears are inside the outer circle. Set its peripheral meridian ring to be â€œperpendicular to the horizon, parallel to the actual meridian,â€ and then point it toward the sun or known star, which will accurately display the position of the celestial body. This model can bring reliable knowledge about where the object appears in the Earth's sky, but the model it uses is completely wrong.

This kind of celestial sphere is in accordance with the understanding of ancient Greece: The earth is in the center of the universe, and the celestial bodies operate around a complete circle. To simulate the non-circular eccentric motion of the planets in the air, the circular gear must be connected to other round gears in a complex way.

Ancient understanding made us feel very magical. But its most fundamental assumption is consistent with ours: the conditions for understanding the world are that the world is recognizable. If there is no similarity between entities, there is no uniform law in all cases, there is no substantive classification of objects, and there is no simpleness under the differences, then we will be in the confusion of the unknown.

Fortunately, we are not in such a world. Thanks to the contributions of Kepler, Copernicus, Galileo, Newton, and others, we can not only predict the position of the celestial bodies more accurately than the best celestial spheres, but also be able to know the world in an unprecedented way: some of us The simple enough laws allow us to discover and understand them. These laws can be applied anywhere and can be applied to anything. They represent the truth of the universe.

What is important for us is that the models that bring knowledge can accurately reflect the way the world operates. Even if the results produced by Arbor Day are exactly the same as those of Newton's Law, we will insist that the model that appeared before Newton's law is wrong. We will insist that ancient people did not understand how the world works because the models they used did not reflect the actual situation.

We insist that this model reflects the operation of the world because we assume that the world reflected by the model is recognizable.

But we now have a different model. Like traditional models, they enable us to make accurate predictions. Like traditional models, they can bring knowledge. However, some new models are incomprehensible.

The success of these models may have shown us an unsettling truth about the ancients and the unexpected origins of their traditions.

Post-calculation scarcity

In the first 50 years of its birth, computers were scarce. People collect the minimum amount of information they need for a certain purpose and then organize that information into records. The limitations were built into the computer's original digestive medium: punch cards. These punch cards turn information into spatial arrays, and those spatial arrays are readable because the array is consistent with its encoding. That kind of consistency eliminates differences, uniqueness, exceptions, and specific things.

Of course, you may ask why the punch card will become the selected mechanism. This is at least partly due to historical reasons: The company founded by Herman Hollerith later became IBM, and he used punch cards to automate the counting process of the 1890 US census. At the end of the 18th century, the punch card was developed to control the pattern weaved by the jacquard loom.

Over the years, computers have expanded the amount of information they can handle, but until we connect them to a global public network, change happens. Computers are now able to accommodate all the information on the Internet. That information includes not only the content of a large data repository but also input from sensors spread across land, sea and sky. The structure of all that information is not regulated, thus prompting the emergence of standards and protocols that deal with data in unpredictable formats, thus preserving differences in information rather than eliminating them because of their inconsistencies. For example, a NoSQL database allows records to have object differences in the areas they capture. The Web creator, Tim Berners-Lee, coined the term â€œlinked dataâ€ to describe information that completely ignores the notion of recording and allows each subtlety of the topic to be expressed in a reusable form.

This makes the concept of information in the age of the Internet very different from the computer age. Now we see more information as streams than resources stored in containers.

Our large machine capacity and strong connectivity have allowed us to realize how complex and uncertain our world is.

For example, Kevin Heng published an article in The American Scientist that pointed out the multi-range problem: "Small disturbances in the system" can have a huge impact on myriad size and time scales.

Models can always be simple: they limit survey research to those we can observe and track. For thousands of years, we have assumed that our simple model will reflect the simplicity of the universe. Today, our machines are letting us see that even if the rules set are very simple, beautiful, and rational, the categories they manage are all so elaborate, so complicated, so interconnected, they are all in one's hands, so that Our brain and our knowledge are incomprehensible. We have to rely on a network of humans and computers to realize that the world was completely dominated by uncertainty - this world is completely chaotic.

Give up knowledge

As early as when Western culture began to discover knowledge, Plato told us that faith is not enough to be truth, because if the belief is truth, then it means that your lucky guess for the winner of the Prikhes race is to be counted as knowledge . This explains why knowledge in Western countries must consist of authentic beliefs that can be confirmed.

Our newly formed reliance on incomprehensible models treats them as a source of evidence for our beliefs and puts us in a strange position. If knowledge includes the corroboration of our beliefs, then knowledge cannot be classified as mental content because it is now confirmed that it consists of models that exist in the machine and cannot be understood by the human mind.

In response, our response may be to no longer rely on computer models that we cannot understand, so that knowledge continues to develop in its original form. This means that we have to give up some types of knowledge. We have already given up some types of knowledge: the court has forbidden certain evidence, because if it is allowed, it will give the police the motive to collect it illegally. Similarly, many research institutions need the approval of the proposed project through the Institutional Review Board to prevent those projects that may be valuable but may harm the interests of the subject.

We have begun to define areas where the social costs of machine verification are too high. For example, Andrew Jennings, senior vice president of FICO scoring and analysis at the credit rating company, said, â€œAs the law requires that people who set up a credit scoring system handle the trade-off between what is expected to be useful and what the law allows, the United States There are many long-standing rules and regulations for credit scores in other regions.â€ For example, machine learning algorithms may find that Baptists are generally low-risk credits, while the Anglicans are not. Although this example is true, those may not be used to calculate credit scores because US law does not allow discrimination based on religious beliefs or other protected classes. Credit scoring companies must also not use data that indicates these attributes, such as subscriptions to Baptist Week.

The credit scoring company's ability to calculate the credit risk model has also been subject to other restrictions. If the creditor rejects the credit application, the lender must provide reasons why the applicant's credit score is not high. In order to meet this requirement, FICO provides consumers with possible explanations as far as possible. For example, Jennings explained that credit applicants may be told, "Your credit score is low because you have repaid overdue credit cards for eight times in the past year."

However, what if FICO manually creates a model that does not have as strong a predictive ability for credit risk as a neural network? In fact, Jennings said that when using the same input variables, they recently compared the credit scores obtained by machine learning techniques and the results of manually creating models. They found that the difference is not significant. . But the benefit of machine learning is that there are times when the predictive power of models that the machine does not understand can be much stronger than human-understandably-manually created models. In those cases, our knowledge, if we use it, will depend on the incomprehension we can't understand.

However, although the machine learning model is very powerful, we must also learn to ask questions. It seems that the failure of the examples is a model in which the machine confirmation does not fully escape the human roots.

For example, a system that is trained to assess the risk posed by individuals applying for bail will release those white offenders but will not release African Americans with fewer criminal records. The system learns from human prejudices and human decision-making is part of the data. The system used by the Central Intelligence Agency (CIA) to identify targets for drone strikes was initially targeted by a prominent journalist at Al Jazeera because the system had received training on small data sets of well-known terrorists. Obviously this type of system still requires manual supervision, especially when it comes to drone strikes rather than classifying cucumbers.

Mike Williams, a research engineer at data analysis firm Fast Forward Labs, said in a telephone interview that we need to be especially alert to prejudices that often affect the categorization of important data sets and important data collection methods. For example, a recent paper discusses a project that uses neural networks to predict the probability of death in pneumonia patients to identify low-risk patients who can be treated as outpatients. The results of neural network predictions are generally more accurate than those manually created models that enforce known rules on the data. However, the neural network clearly states that patients with asthmatic pneumonia have a lower risk of death and should therefore be treated as outpatients. This contradicts what the nurses know and common sense. In the end, the researchers found that the finding was due to a fact that patients with asthmatic pneumonia were immediately sent to the intensive care unit, so the survival rate was high. But obviously this does not mean they should be sent home. Instead they should be hospitalized. To identify this error requires manual supervision.

Cathy O'Neill, the author of the new book, "Masmatic Weapons of Weapons," pointed out the implicit bias in the values â€‹â€‹that determine which data sets we use to train computers. She spoke of an example of someone looking for the most suitable candidate for a position. One of the conditions listed was: "Can afford to work for many years and strive for promotion opportunities." If you use a machine learning algorithm to accomplish this task, you will most likely hire a man because the time that women stay in the same job is usually relatively short. She said the same applies to the use of machine intelligence to identify poor teachers in the public school system. What is a bad teacher? See the average score of their class students in the standardized test? See how many students eventually graduate? How many students go to college? Looking at the student's annual income after graduation? See if students are happy after graduation? Humans may be able to define, but machine learning algorithms are likely to re-establish the prejudice we have chosen to imply for the provisioned data.

Therefore, we may catch both hands together. On the one hand, we will continue our tradition of prohibiting certain types of confirmation to avoid bad social influence. At the same time, we may continue to rely more and more on machine verification that we cannot understand.

The problem is not only that we cannot understand them, just as laymen cannot understand string theory. It is also that computer-based corroboration is essentially completely different from human corroboration. It is a different kind.

However, "heterogeneous" does not mean "error." When it comes to understanding the world, machines may be closer to the truth than we humans at any time.

Heterogeneous confirmation

Somewhere there is a worm more curious than its peers. It will slowly pass through the soil and taste the taste of every piece of land it passes through, always looking for the next new soil sample because it believes that the highest mission of the worm is to know its world, and tasting is how it acquires knowledge. . With a wealth of experience and outstanding categorization and expressiveness, this worm is highly regarded among its peers and regarded as a sage who can teach what the earth tastes like.

But taste is not an attribute of the earth, nor is it part of it. What our honorable worms taste is the result of their taste and the chemical composition of the soil. The worm's organ can only let it know the world through the qualities associated with some of the properties of things on Earth, but in reality the world is not like that. As cognitive scientist Donald Hoffman puts it, realistic cognition is unlikely to make organisms more suitable for evolution.

We believe that human knowledge is different from the knowledge of intelligent worms. We can discern regular order, which can bring consistency and predictability to the chaos brought by the senses. We are not worms.

indeed so. But the more details we fill our global computer network, the less the world is like a well-armed celestial sphere. Our machine now allows us to understand that even if the principles under which the universe operates are not as much more complex than Go, the interaction between everything will suddenly make it become Bias Christopher, Newton, Einstein and even some Chaos theorists believe more uncertainty. It seems to be orderly just because our instrumental instruments are obvious, just because our concept of knowledge enforces order by simplifying matters before we find order, simply because our needs are satisfied with approximations.

If you just want to put the 8th ball into the corner bag, there is no problem. But if you want to know the actual path that the ball will take, then you have to consider the frictional force it generates at the molecular level as it passes through each fiber of the felt, taking into account the time difference between the pull of the Moon and the shaking of the Earth. Consider The irregular effects of the photons emitted by stage lighting and side lights, consider the change in airflow when your opponent holds his breath. Not to mention quantum uncertainty. Nothing like these will affect whether or not you hit the ball, but it is exactly what happened. Even if the universe is dominated by laws that are simple enough for us to understand, the simplest things in the universe are incomprehensible unless they are simplified.

Our machines are making us aware of this because they do not require us to reduce the information to fit into a stack of punch cards. Because it has such new capabilities, we now tend to give them all the information before asking questions.

Of course, the knowledge we acquire will still be just the tip of the iceberg that the universe can bring, and it is easily influenced by our prejudices and assumptions. Even so, this new amount of data is making one of the truths of thousands of years clear: knowledge is superior to us.

In the mid-1990s, the World Wide Web began to push us out of the traditional way we knew the world by reducing what we needed to know. Knowledge quickly escaped the prison of paper and stayed online. For example, if you now want to know King Lear, you will go online to find out. On the internet, our knowledge of the drama exists in countless literary scholars, historians, linguists. Links to websites created by digital humanists, actors, directors, and spectators. These content creators include professionals, amateurs, stupid people, and academics. Our knowledge of the "King Lear" boundary is determined by everyone's interests and existing projects. The networked knowledge in the boundaries that are easy to disappear is very large and interrelated, and it is often inconsistent. This is what it looks like when it is scaled.

The rise of machine learning further underscores the inadequacy of human understanding relative to the tasks it sets for itself. Not to say that knowing the Higgs boson needs a network of hardware, software, scientists, engineers, and mathematicians. Michael Nielsen pointed out this in his 2011 excellent book "Remodeling Discovery: A New Era of Networked Science at the Harvard Berkman Klein Center." After all, the traditional confirmation of knowledge allows us to rely on trustworthy resources. This is partly because we know that theoretically we can interview everyone involved and we can determine if they are an important part of the confirmation of the Higgs boson.

However, when neural networks produce results based on a process that is different from the way humans confirm knowledge, we have no alternative. We can prove that the result produced by the machine is likely to be knowledge by pointing out that AlphaGo won the Go game and that the mobile network of the driverless car can produce fewer accidents. But we don't necessarily understand why AlphaGo played so much instead of playing chess. It does not necessarily understand why it should obviously turn left, and the driverless car turns right. This involves too much input of information. The decision of the machine is based on a comprehensive consideration of various correlations. These even the most intelligent human brain cannot understand.

We are beginning to experience a paradigm shift in everyday thinking about how knowledge and the world work. In the case where we originally thought that the simple principle was based on relatively predictable data, we are now becoming clearly aware that what looks simple is actually extremely complicated. Originally we thought that the movement of the celestial body was regular, and the unpredictable events in life were all anomaliesâ€”only "accidents," Aristotle's concept of distinguishing between the "basic" attributes of things and things. Today, everything that happens is indefinable and has become an example of our thinking.

This is enabling us to determine the knowledge outside our brain. We can only know what we know, because we have a close relationship with our own heterogeneous tools. Our brain power is not enough.

The philosophy of pragmatism from 100 years ago helped us prepare ourselves mentally to deal with this change by limiting our ambitions: knowledge is more of a tool that operates in the world than a reflection of the world. The German philosopher Martin Heidegger's phenomenological theory provided a different kind of correction, pointing out that the view that knowledge is the mental representation of the world is false.

Andy Clark and David Chalmers provided a more direct reshaping of knowledge based at least in part on Heidegger's extended mental theory. In his book published in 1996, Clarke pointed out that we always use tools to obtain knowledge. Four thousand years ago, the shepherd who did not understand mathematics needed some pebbles to make sure that the number of sheep he brought back was the same as at departure. Now physicists are likely to need whiteboards for cognitive work. Architects need large pieces of paper, rulers or even 3D models to think about the structure of the building. Watson and Crick need home-made devices to understand the structure of DNA. Now that the staff in these areas have switched to computers, shepherds may even be equipped with the latest iSheep applications. However, the situation is still the same: we use tools to understand this world. Treating knowledge as a kind of mental content - a proven and authentic view - obscures the simple truth of the phenomenon.

As long as our computer model can present our own views, we can maintain the illusion that the world is working like ours (and our model). Once our computers began to create models on their own, and those models exceeded our understanding, we lost the kind of worrying assumptions. Our machines make the limitations of our epistemology obvious, and by bringing amendments, they reveal a truth about the universe.

This world does not happen to be designed by God or accidentally designed to be known to the human brain. The nature of the world is closer to the way our computer and sensor networks embody it than to the perception of the human mind. As machines can operate independently, we are losing the illusion that the world happens to be simple enough for us humans to understand.

Using our own machine network, we realized that we were different. (Lebang)