The Coming Swarm DDoS Actions, Hacktivism, and Civil Disobedience on the Internet

hacktivism_map

On November 28, 2010, Wikileaks, along with the New York Times, Der Spiegel, El Pais, Le Monde, and The Guardian began releasing documents from a leaked cache of 2,51,287 unclassified and classified US diplomatic cables, copied from the closed Department of Defense network SIPrnet. The US government was furious. In the days that followed, different organizations and corporations began distancing themselves from Wikileaks. Amazon WebServices declined to continue hosting Wikileaks’ website, and on December 1, removed its content from its servers. The next day, the public could no longer reach the Wikileaks website at wikileaks.org; Wikileaks’ Domain Name System (DNS) provider, EveryDNS, had dropped the site from its entries on December 2, temporarily making the site inaccessible through its URL. Shortly thereafter, what would be known as the “Banking Blockade” began, with PayPal, PostFinance, Mastercard, Visa, and Bank of America refusing to process online donations to Wikileaks, essentially halting the flow of monetary donations to the organization.

Wikileaks’ troubles attracted the attention of anonymous, a loose group of internet denizens, and in particular, a small subgroup known as AnonOps, who had been engaged in a retaliatory distributed denial of service (DDoS) campaign called Operation Payback, targeting the Motion Picture Association of America and other pro-copyright, antipiracy groups since September 2010. A DDoS action is, simply, when a large number of computers attempt to access one website over and over again in a short amount of time, in the hopes of overwhelming the server, rendering it incapable of responding to legitimate requests. Anons, as members of the anonymous subculture are known, were happy to extend Operation Payback’s range of targets to include the forces arrayed against Wikileaks and its public face, Julian Assange. On December 6, they launched their first DDoS action against the website of the Swiss banking service, PostFinance. Over the course of the next 4 days, anonymous and AnonOps would launch DDoS actions against the websites of the Swedish Prosecution Authority, EveryDNS, Senator Joseph Lieberman, Mastercard, two Swedish politicians, Visa, PayPal, and amazon.com, and others, forcing many of the sites to experience at least some amount of downtime.

For many in the media and public at large, Anonymous’ December 2010 DDoS campaign was their first exposure to the use of this tactic by activists, and the exact nature of the action was unclear. Was it an activist action, a legitimate act of protest, an act of terrorism, or a criminal act? These DDoS actions – concerted efforts by many individuals to bring down websites by making repeated requests of the websites’ servers in a short amount of time – were covered extensively by the media. This coverage was inconsistent in its characterization but was open to the idea that these actions could be legitimately political in nature. In the eyes of the media and public, Operation Payback opened the door to the potential for civil disobedience and disruptive activism on the internet. But Operation Payback was far from the first use of DDoS as a tool of activism. Rather, DDoS actions have been in use for over two decades, in support of activist campaigns ranging from pro-Zapatistas actions to protests against German immigration policy and trademark enforcement disputes….

The Coming Swarm DDOS Actions, Hacktivism, and Civil Disobedience on the Internet

Advertisements

Fragmentation – Lit and Dark Electronic Exchanges. Thought of the Day 116.0

Untitled

Exchanges also control the amount and degree of granularity of the information you receive (e.g., you can use the consolidated/public feed at a low cost or pay a relatively much larger cost for direct/proprietary feeds from the exchanges). They also monetise the need for speed by renting out computer/server space next to their matching engines, a process called colocation. Through coloca­tion, exchanges can provide uniform service to trading clients at competitive rates. Having the traders’ trading engines at a common location owned by the exchange simplifies the exchange’s ability to provide uniform service as it can control the hardware connecting each client to the trading engine, the cable (so all have the same cable of the same length), and the network. This ensures that all traders in colocation have the same fast access, and are not disadvantaged (at least in terms of exchange-provided hardware). Naturally, this imposes a clear distinction between traders who are colocated and those who are not. Those not colocated will always have a speed disadvantage. It then becomes an issue for reg­ulators who have to ensure that exchanges keep access to colocation sufficiently competitive.

The issue of distance from the trading engine brings us to another key dimen­sion of trading nowadays, especially in US equity markets, namely fragmentation. A trader in US equities markets has to be aware that there are up to 13 lit electronic exchanges and more than 40 dark ones. Together with this wide range of trading options, there is also specific regulation (the so-called ‘trade-through’ rules) which affects what happens to market orders sent to one exchange if there are better execution prices at other exchanges. The interaction of multiple trading venues, latency when moving be­tween these venues, and regulation introduces additional dimensions to keep in mind when designing success l trading strategies.

The role of time is fundamental in the usual price-time priority electronic ex­change, and in a fragmented market, the issue becomes even more important. Traders need to be able to adjust their trading positions fast in response to or in anticipation of changes in market circumstances, not just at the local exchange but at other markets as well. The race to be the first in or out of a certain position is one of the focal points of the debate on the benefits and costs of ‘high-frequency trading’.

The importance of speed permeates the whole process of designing trading algorithms, from the actual code, to the choice of programming language, to the hardware it is implemented on, to the characteristics of the connection to the matching engine, and the way orders are routed within an exchange and between exchanges. Exchanges, being aware of the importance of speed, have adapted and, amongst other things, moved well beyond the basic two types of orders (Market Orders and Limit Orders). Any trader should be very well-informed regarding all the different order types available at the exchanges, what they are and how they may be used.

When coding an algorithm one should be very aware of all the possible types of orders allowed, not just in one exchange, but in all competing exchanges where one’s asset of interest is traded. Being uninformed about the variety of order types can lead to significant losses. Since some of these order types allow changes and adjustments at the trading engine level, they cannot be beaten in terms of latency by the trader’s engine, regardless of how efficiently your algorithms are coded and hardwired.

Untitled

Another important issue to be aware of is that trading in an exchange is not free, but the cost is not the same for all traders. For example, many exchanges run what is referred to as a maker-taker system of fees whereby a trader sending an MO (and hence taking liquidity away from the market) pays a trading fee, while a trader whose posted LO is filled by the MO (that is, the LO with which the MO is matched) will a pay much lower trading fee, or even receive a payment (a rebate) from the exchange for providing liquidity (making the market). On the other hand, there are markets with an inverted fee schedule, a taker-maker system where the fee structure is the reverse: those providing liquidity pay a higher fee than those taking liquidity (who may even get a rebate). The issue of exchange fees is quite important as fees distort observed market prices (when you make a transaction the relevant price for you is the net price you pay/receive, which is the published price net of fees).

Stuxnet

Untitled

Stuxnet is a threat targeting a specific industrial control system likely in Iran, such as a gas pipeline or power plant. The ultimate goal of Stuxnet is to sabotage that facility by reprogramming programmable logic controllers (PLCs) to operate as the attackers intend them to, most likely out of their specified boundaries.

Stuxnet was discovered in July, but is confirmed to have existed at least one year prior and likely even before. The majority of infections were found in Iran. Stuxnet contains many features such as:

  • Self-replicates through removable drives exploiting a vulnerability allowing auto-execution. Microsoft Windows Shortcut ‘LNK/PIF’ Files Automatic File Execution Vulnerability (BID 41732)
  • Spreads in a LAN through a vulnerability in the Windows Print Spooler.
    Microsoft Windows Print Spooler Service Remote Code Execution Vulnerability (BID 43073)
  • Spreads through SMB by exploiting the Microsoft Windows Server Service RPC Handling Remote Code Execution Vulnerability (BID 31874).
  • Copies and executes itself on remote computers through network shares.
  • Copies and executes itself on remote computers running a WinCC database server.
  • Copies itself into Step 7 projects in such a way that it automatically executes when the Step 7 project is loaded.
  • Updates itself through a peer-to-peer mechanism within a LAN.
  • Exploits a total of four unpatched Microsoft vulnerabilities, two of which are previously mentioned vulnerabilities for self-replication and the other two are escalation of privilege vulnerabilities that have yet to be disclosed.
  • Contacts a command and control server that allows the hacker to download and execute code, including updated versions.
  • Contains a Windows rootkit that hide its binaries.
  • Attempts to bypass security products.
  • Fingerprints a specific industrial control system and modifies code on the Siemens PLCs to potentially sabotage the system.
  • Hides modified code on PLCs, essentially a rootkit for PLCs.

The following is a possible attack scenario. It is only speculation driven by the technical features of Stuxnet.

Industrial control systems (ICS) are operated by a specialized assembly like code on programmable logic controllers (PLCs). The PLCs are often programmed from Windows computers not connected to the Internet or even the internal network. In addition, the industrial control systems themselves are also unlikely to be connected to the Internet.

First, the attackers needed to conduct reconnaissance. As each PLC is configured in a unique manner, the attack- ers would first need the ICS’s schematics. These design documents may have been stolen by an insider or even retrieved by an early version of Stuxnet or other malicious binary. Once attackers had the design documents and potential knowledge of the computing environment in the facility, they would develop the latest version of Stuxnet. Each feature of Stuxnet was implemented for a specific reason and for the final goal of potentially sabotaging the ICS.

Attackers would need to setup a mirrored environment that would include the necessary ICS hardware, such as PLCs, modules, and peripherals in order to test their code. The full cycle may have taken six months and five to ten core developers not counting numerous other individuals, such as quality assurance and management.

In addition their malicious binaries contained driver files that needed to be digitally signed to avoid suspicion. The attackers compromised two digital certificates to achieve this task. The attackers would have needed to obtain the digital certificates from someone who may have physically entered the premises of the two companies and stole them, as the two companies are in close physical proximity.

To infect their target, Stuxnet would need to be introduced into the target environment. This may have occurred by infecting a willing or unknowing third party, such as a contractor who perhaps had access to the facility, or an insider. The original infection may have been introduced by removable drive.

Once Stuxnet had infected a computer within the organization it began to spread in search of Field PGs, which are typical Windows computers but used to program PLCs. Since most of these computers are non-networked, Stuxnet would first try to spread to other computers on the LAN through a zero-day vulnerability, a two year old vulnerability, infecting Step 7 projects, and through removable drives. Propagation through a LAN likely served as the first step and propagation through removable drives as a means to cover the last and final hop to a Field PG that is never connected to an untrusted network.

While attackers could control Stuxnet with a command and control server, as mentioned previously the key computer was unlikely to have outbound Internet access. Thus, all the functionality required to sabotage a system was embedded directly in the Stuxnet executable. Updates to this executable would be propagated throughout the facility through a peer-to-peer method established by Stuxnet.

When Stuxnet finally found a suitable computer, one that ran Step 7, it would then modify the code on the PLC. These modifications likely sabotaged the system, which was likely considered a high value target due to the large resources invested in the creation of Stuxnet.

Victims attempting to verify the issue would not see any rogue PLC code as Stuxnet hides its modifications.

While their choice of using self-replication methods may have been necessary to ensure they’d find a suitable Field PG, they also caused noticeable collateral damage by infecting machines outside the target organization. The attackers may have considered the collateral damage a necessity in order to effectively reach the intended target. Also, the attackers likely completed their initial attack by the time they were discovered.

Stuxnet dossier

The Illicit Trade of Firearms, Explosives and Ammunition on the Dark Web

keyboard

DATACRYPTO is a web crawler/scraper class of software that systematically archives websites and extracts information from them. Once a cryptomarket has been identified, DATACRYPTO is set up to log in to the market and download its contents, beginning at the web page fixed by the researchers (typically the homepage). After downloading that page, DATACRYPTO parses it for hyperlinks to other pages hosted on the same market and follows each, adding new hyperlinks encountered, and visiting and downloading these, until no new pages are found. This process is referred to as web crawling. DATACRYPTO then switches from crawler to scraper mode, extracting information from the pages it has downloaded into a single database.

One challenge connected to crawling cryptomarkets arises when, despite appearances to the contrary, the crawler has indexed only a subset of a marketplace’s web pages. This problem is particularly exacerbated by sluggish download speeds on the Tor network which, combined with marketplace downtime, may prevent DATACRYPTO from completing the crawl of a cryptomarket. DATACRYPTO was designed to prevent partial marketplace crawls through its ‘state-aware’ capability, meaning that the result of each page request is analysed and logged by the software. In the event of service disruptions on the marketplace or on the Tor network, DATACRYPTO pauses and then attempts to continue its crawl a few minutes later. If a request for a page returns a different page (e.g. asking for a listing page and receiving the home page of the cryptomarket), the request is marked as failed, with each crawl tallying failed page requests.

DATACRYPTO is programmed for each market to extract relevant information connected to listings and vendors, which is then collected into a single database:

  • Product title;
  • Product description;
  • Listing price;
  • Number of customer feedbacks for the listing;
  • The country or region from which a vendor ships the product;
  • The country or regions to which the vendor placing the listing is willing to ship.

DATACRYPTO is not the first crawler to mirror the dark web, but is novel in its ability to pull information from a variety of cryptomarkets at once, despite differences in page structure and naming conventions across sites. For example, “$…” on one market may give you the price of a listing. On another market, price might be signified by “VALUE…” or “PRICE…” instead.

Researchers who want to create a similar tool to gather data through crawling the web should detail which information exactly they would like to extract. When building a web crawler it is, for example, very important to carefully study the structure and characteristics of the websites to be mirrored. Before setting the crawler loose, ensure that it extracts and parses correct and complete information. Because the process of building a crawler-tool like DATACRYPTO can be costly and time consuming, it is also important to anticipate on future data needs, and build in capabilities to extract that kind of data later on, so no large future modifications are necessary.

Building a complex tool like DATACRYPTO is no easy feat. The crawler needs to be able to copy pages, but also stealthily get around CAPTCHAs and log itself in onto the TOR server. Due to their bulkiness, web crawlers can place a heavy burden on a website’s server, and are easily detected due to their repetitive pattern moving between pages. Site administrators are therefore not afraid to IP-ban badly designed crawlers from their sites.

The Illicit Trade of Firearms Explosives and Ammunition on the Dark Web

Cryptocurrency and Efficient Market Hypothesis. Drunken Risibility.

According to the traditional definition, a currency has three main properties: (i) it serves as a medium of exchange, (ii) it is used as a unit of account and (iii) it allows to store value. Along economic history, monies were related to political power. In the beginning, coins were minted in precious metals. Therefore, the value of a coin was intrinsically determined by the value of the metal itself. Later, money was printed in paper bank notes, but its value was linked somewhat to a quantity in gold, guarded in the vault of a central bank. Nation states have been using their political power to regulate the use of currencies and impose one currency (usually the one issued by the same nation state) as legal tender for obligations within their territory. In the twentieth century, a major change took place: abandoning gold standard. The detachment of the currencies (specially the US dollar) from the gold standard meant a recognition that the value of a currency (specially in a world of fractional banking) was not related to its content or representation in gold, but to a broader concept as the confidence in the economy in which such currency is based. In this moment, the value of a currency reflects the best judgment about the monetary policy and the “health” of its economy.

In recent years, a new type of currency, a synthetic one, emerged. We name this new type as “synthetic” because it is not the decision of a nation state, nor represents any underlying asset or tangible wealth source. It appears as a new tradable asset resulting from a private agreement and facilitated by the anonymity of internet. Among this synthetic currencies, Bitcoin (BTC) emerges as the most important one, with a market capitalization of a few hundred million short of $80 billions.

bitcoin-price-bitstamp-sept1

Bitcoin Price Chart from Bitstamp

There are other cryptocurrencies, based on blockchain technology, such as Litecoin (LTC), Ethereum (ETH), Ripple (XRP). The website https://coinmarketcap.com/currencies/ counts up to 641 of such monies. However, as we can observe in the figure below, Bitcoin represents 89% of the capitalization of the market of all cryptocurrencies.

Untitled

Cryptocurrencies. Share of market capitalization of each currency.

One open question today is if Bitcoin is in fact a, or may be considered as a, currency. Until now, we cannot observe that Bitcoin fulfills the main properties of a standard currency. It is barely (though increasingly so!) accepted as a medium of exchange (e.g. to buy some products online), it is not used as unit of account (there are no financial statements valued in Bitcoins), and we can hardly believe that, given the great swings in price, anyone can consider Bitcoin as a suitable option to store value. Given these characteristics, Bitcoin could fit as an ideal asset for speculative purposes. There is no underlying asset to relate its value to and there is an open platform to operate round the clock.

Untitled

Bitcoin returns, sampled every 5 hours.

Speculation has a long history and it seems inherent to capitalism. One common feature of speculative assets in history has been the difficulty in valuation. Tulipmania, the South Sea bubble, and more others, reflect on one side human greedy behavior, and on the other side, the difficulty to set an objective value to an asset. All speculative behaviors were reflected in a super-exponential growth of the time series.

Cryptocurrencies can be seen as the libertarian response to central bank failure to manage financial crises, as the one occurred in 2008. Also cryptocurrencies can bypass national restrictions to international transfers, probably at a cheaper cost. Bitcoin was created by a person or group of persons under the pseudonym Satoshi Nakamoto. The discussion of Bitcoin has several perspectives. The computer science perspective deals with the strengths and weaknesses of blockchain technology. In fact, according to R. Ali et. al., the introduction of a “distributed ledger” is the key innovation. Traditional means of payments (e.g. a credit card), rely on a central clearing house that validate operations, acting as “middleman” between buyer and seller. On contrary, the payment validation system of Bitcoin is decentralized. There is a growing army of miners, who put their computer power at disposal of the network, validating transactions by gathering together blocks, adding them to the ledger and forming a ’block chain’. This work is remunerated by giving the miners Bitcoins, what makes (until now) the validating costs cheaper than in a centralized system. The validation is made by solving some kind of algorithm. With the time solving the algorithm becomes harder, since the whole ledger must be validated. Consequently it takes more time to solve it. Contrary to traditional currencies, the total number of Bitcoins to be issued is beforehand fixed: 21 million. In fact, the issuance rate of Bitcoins is expected to diminish over time. According to Laursen and Kyed, validating the public ledger was initially rewarded with 50 Bitcoins, but the protocol foresaw halving this quantity every four years. At the current pace, the maximum number of Bitcoins will be reached in 2140. Taking into account the decentralized character, Bitcoin transactions seem secure. All transactions are recorded in several computer servers around the world. In order to commit fraud, a person should change and validate (simultaneously) several ledgers, which is almost impossible. Additional, ledgers are public, with encrypted identities of parties, making transactions “pseudonymous, not anonymous”. The legal perspective of Bitcoin is fuzzy. Bitcoin is not issued, nor endorsed by a nation state. It is not an illegal substance. As such, its transaction is not regulated.

In particular, given the nonexistence of saving accounts in Bitcoin, and consequently the absense of a Bitcoin interest rate, precludes the idea of studying the price behavior in relation with cash flows generated by Bitcoins. As a consequence, the underlying dynamics of the price signal, finds the Efficient Market Hypothesis as a theoretical framework. The Efficient Market Hypothesis (EMH) is the cornerstone of financial economics. One of the seminal works on the stochastic dynamics of speculative prices is due to L Bachelier, who in his doctoral thesis developed the first mathematical model concerning the behavior of stock prices. The systematic study of informational efficiency begun in the 1960s, when financial economics was born as a new area within economics. The classical definition due to Eugene Fama (Foundations of Finance_ Portfolio Decisions and Securities Prices 1976-06) says that a market is informationally efficient if it “fully reflects all available information”. Therefore, the key element in assessing efficiency is to determine the appropriate set of information that impels prices. Following Efficient Capital Markets, informational efficiency can be divided into three categories: (i) weak efficiency, if prices reflect the information contained in the past series of prices, (ii) semi-strong efficiency, if prices reflect all public information and (iii) strong efficiency, if prices reflect all public and private information. As a corollary of the EMH, one cannot accept the presence of long memory in financial time series, since its existence would allow a riskless profitable trading strategy. If markets are informationally efficient, arbitrage prevent the possibility of such strategies. If we consider the financial market as a dynamical structure, short term memory can exist (to some extent) without contradicting the EMH. In fact, the presence of some mispriced assets is the necessary stimulus for individuals to trade and reach an (almost) arbitrage free situation. However, the presence of long range memory is at odds with the EMH, because it would allow stable trading rules to beat the market.

The presence of long range dependence in financial time series generates a vivid debate. Whereas the presence of short term memory can stimulate investors to exploit small extra returns, making them disappear, long range correlations poses a challenge to the established financial model. As recognized by Ciaian et. al., Bitcoin price is not driven by macro-financial indicators. Consequently a detailed analysis of the underlying dynamics (Hurst exponent) becomes important to understand its emerging behavior. There are several methods (both parametric and non parametric) to calculate the Hurst exponent, which become a mandatory framework to tackle BTC trading.

String’s Depth of Burial

string_conversion_03.2013

A string’s depth might be defined as the execution time of its minimal program.

The difficulty with this definition arises in cases where the minimal program is only a few bits smaller than some much faster program, such as a print program, to compute the same output x. In this case, slight changes in x may induce arbitrarily large changes in the run time of the minimal program, by changing which of the two competing programs is minimal. Analogous instability manifests itself in translating programs from one universal machine to another. This instability emphasizes the essential role of the quantity of buried redundancy, not as a measure of depth, but as a certifier of depth. In terms of the philosophy-of-science metaphor, an object whose minimal program is only a few bits smaller than its print program is like an observation that points to a nontrivial hypothesis, but with only a low level of statistical confidence.

To adequately characterize a finite string’s depth one must therefore consider the amount of buried redundancy as well as the depth of its burial. A string’s depth at significance level s might thus be defined as that amount of time complexity which is attested by s bits worth of buried redundancy. This characterization of depth may be formalized in several ways.

A string’s depth at significance level s be defined as the time required to compute the string by a program no more than s bits larger than the minimal program.

This definition solves the stability problem, but is unsatisfactory in the way it treats multiple programs of the same length. Intuitively, 2k distinct (n + k)-bit programs that compute same output ought to be accorded the same weight as one n-bit program; but, by the present definition, they would be given no more weight than one (n + k)-bit program.

A string’s depth at signicifcance level s depth might be defined as the time t required for the string’s time-bounded algorithmic probability Pt(x) to rise to within a factor 2−s of its asymptotic time-unbounded value P(x).

This formalizes the notion that for the string to have originated by an effective process of t steps or fewer is less plausible than for the first s tosses of a fair coin all to come up heads.

It is not known whether there exist strings that are deep, in other words, strings having no small fast programs, even though they have enough large fast programs to contribute a significant fraction of their algorithmic probability. Such strings might be called deterministically deep but probabilistically shallow, because their chance of being produced quickly in a probabilistic computation (e.g. one where the input bits of U are supplied by coin tossing) is significant compared to their chance of being produced slowly. The question of whether such strings exist is probably hard to answer because it does not relativize uniformly. Deterministic and probabilistic depths are not very different relative to a random coin-toss oracle A of the equality of random-oracle-relativized deterministic and probabilistic polynomial time complexity classes; but they can be very different relative to an oracle B deliberately designed to hide information from deterministic computations (this parallels Hunt’s proof that deterministic and probabilistic polynomial time are unequal relative to such an oracle).

(Depth of Finite Strings): Let x and w be strings and s a significance parameter. A string’s depth at significance level s, denoted Ds(x), will be defined as min{T(p) : (|p|−|p| < s)∧(U(p) = x)}, the least time required to compute it by a s-incompressible program. At any given significance level, a string will be called t-deep if its depth exceeds t, and t-shallow otherwise.

The difference between this definition and the previous one is rather subtle philosophically and not very great quantitatively. Philosophically, when each individual hypothesis for the rapid origin of x is implausible at the 2−s confidence level, then it requires only that a weighted average of all such hypotheses be implausible.

There exist constants c1 and c2 such that for any string x, if programs running in time ≤ t contribute a fraction between 2−s and 2−s+1 of the string’s total algorithmic probability, then x has depth at most t at significance level s + c1 and depth at least t at significance level s − min{H(s), H(t)} − c2.

Proof : The first part follows easily from the fact that any k-compressible self-delimiting program p is associated with a unique, k − O(1) bits shorter, program of the form “execute the result of executing p∗”. Therefore there exists a constant c1 such that if all t-fast programs for x were s + c1– compressible, the associated shorter programs would contribute more than the total algorithmic probability of x. The second part follows because, roughly, if fast programs contribute only a small fraction of the algorithmic probability of x, then the property of being a fast program for x is so unusual that no program having that property can be random. More precisely, the t-fast programs for x constitute a finite prefix set, a superset S of which can be computed by a program of size H(x) + min{H(t), H(s)} + O(1) bits. (Given x∗ and either t∗ or s∗, begin enumerating all self-delimiting programs that compute x, in order of increasing running time, and quit when either the running time exceeds t or the accumulated measure of programs so far enumerated exceeds 2−(H(x)−s)). Therefore there exists a constant c2 such that, every member of S, and thus every t-fast program for x, is compressible by at least s − min{H(s), H(t)} − O(1) bits.

The ability of universal machines to simulate one another efficiently implies a corresponding degree of machine-independence for depth: for any two efficiently universal machines of the sort considered here, there exists a constant c and a linear polynomial L such that for any t, strings whose (s+c)-significant depth is at least L(t) on one machine will have s-significant depth at least t on the other.

Depth of one string relative to another may be defined analogously, and represents the plausible time required to produce one string, x, from another, w.

(Relative Depth of Finite Strings): For any two strings w and x, the depth of x relative to w at significance level s, denoted Ds(x/w), will be defined as min{T(p, w) : (|p|−|(p/w)∗| < s)∧(U(p, w) = x)}, the least time required to compute x from w by a program that is s-incompressible relative to w.

Depth of a string relative to its length is a particularly useful notion, allowing us, as it were, to consider the triviality or nontriviality of the “content” of a string (i.e. its bit sequence), independent of its “form” (length). For example, although the infinite sequence 000… is intuitively trivial, its initial segment 0n is deep whenever n is deep. However, 0n is always shallow relative to n, as is, with high probability, a random string of length n.

In order to adequately represent the intuitive notion of stored mathematical work, it is necessary that depth obey a “slow growth” law, i.e. that fast deterministic processes be unable to transform a shallow object into a deep one, and that fast probabilistic processes be able to do so only with low probability.

(Slow Growth Law): Given any data string x and two significance parameters s2 > s1, a random program generated by coin tossing has probability less than 2−(s2−s1)+O(1) of transforming x into an excessively deep output, i.e. one whose s2-significant depth exceeds the s1-significant depth of x plus the run time of the transforming program plus O(1). More precisely, there exist positive constants c1, c2 such that for all strings x, and all pairs of significance parameters s2 > s1, the prefix set {q : Ds2(U(q, x)) > Ds1(x) + T(q, x) + c1} has measure less than 2−(s2−s1)+c2.

Proof: Let p be a s1-incompressible program which computes x in time Ds1(x), and let r be the restart prefix mentioned in the definition of the U machine. Let Q be the prefix set {q : Ds2(U(q, x)) > T(q, x) + Ds1(x) + c1}, where the constant c1 is sufficient to cover the time overhead of concatenation. For all q ∈ Q, the program rpq by definition computes some deep result U(q, x) in less time than that result’s own s2-significant depth, and so rpq must be compressible by s2 bits. The sum of the algorithmic probabilities of strings of the form rpq, where q ∈ Q, is therefore

Σq∈Q P(rpq)< Σq∈Q 2−|rpq| + s2 = 2−|r|−|p|+s2 μ(Q)

On the other hand, since the self-delimiting program p can be recovered from any string of the form rpq (by deleting r and executing the remainder pq until halting occurs, by which time exactly p will have been read), the algorithmic probability of p is at least as great (within a constant factor) as the sum of the algorithmic probabilities of the strings {rpq : q ∈ Q} considered above:

P(p) > μ(Q) · 2−|r|−|p|+s2−O(1)

Recalling the fact that minimal program size is equal within a constant factor to the −log of algorithmic probability, and the s1-incompressibility of p, we have P(p) < 2−(|p|−s1+O(1)), and therefore finally

μ(Q) < 2−(s2−s1)+O(1), which was to be demonstrated.

Universal Turing Machine: Algorithmic Halting

169d342be4ac9fdca10d1c8c9c04c3df

A natural number x will be identified with the x’th binary string in lexicographic order (Λ,0,1,00,01,10,11,000…), and a set X of natural numbers will be identified with its characteristic sequence, and with the real number between 0 and 1 having that sequence as its dyadic expansion. The length of a string x will be denoted |x|, the n’th bit of an infinite sequence X will be noted X(n), and the initial n bits of X will be denoted Xn. Concatenation of strings p and q will be denoted pq.

We now define the information content (and later the depth) of finite strings using a universal Turing machine U. A universal Turing machine may be viewed as a partial recursive function of two arguments. It is universal in the sense that by varying one argument (“program”) any partial recursive function of the other argument (“data”) can be obtained. In the usual machine formats, program, data and output are all finite strings, or, equivalently, natural numbers. However, it is not possible to take a uniformly weighted average over a countably infinite set. Chaitin’s universal machine has two tapes: a read-only one-way tape containing the infinite program; and an ordinary two-way read/write tape, which is used for data input, intermediate work, and output, all of which are finite strings. Our machine differs from Chaitin’s in having some additional auxiliary storage (e.g. another read/write tape) which is needed only to improve the time efficiency of simulations.

We consider only terminating computations, during which, of course, only a finite portion of the program tape can be read. Therefore, the machine’s behavior can still be described by a partial recursive function of two string arguments U(p, w), if we use the first argument to represent that portion of the program that is actually read in the course of a particular computation. The expression U (p, w) = x will be used to indicate that the U machine, started with any infinite sequence beginning with p on its program tape and the finite string w on its data tape, performs a halting computation which reads exactly the initial portion p of the program, and leaves output data x on the data tape at the end of the computation. In all other cases (reading less than p, more than p, or failing to halt), the function U(p, w) is undefined. Wherever U(p, w) is defined, we say that p is a self-delimiting program to compute x from w, and we use T(p, w) to represent the time (machine cycles) of the computation. Often we will consider computations without input data; in that case we abbreviate U(p, Λ) and T(p, Λ) as U(p) and T(p) respectively.

The self-delimiting convention for the program tape forces the domain of U and T, for each data input w, to be a prefix set, that is, a set of strings no member of which is the extension of any other member. Any prefix set S obeys the Kraft inequality

p∈S 2−|p| ≤ 1

Besides being self-delimiting with regard to its program tape, the U machine must be efficiently universal in the sense of being able to simulate any other machine of its kind (Turing machines with self-delimiting program tape) with at most an additive constant constant increase in program size and a linear increase in execution time.

Without loss of generality we assume that there exists for the U machine a constant prefix r which has the effect of stacking an instruction to restart the computation when it would otherwise end. This gives the machine the ability to concatenate programs to run consecutively: if U(p, w) = x and U(q, x) = y, then U(rpq, w) = y. Moreover, this concatenation should be efficient in the sense that T (rpq, w) should exceed T (p, w) + T (q, x) by at most O(1). This efficiency of running concatenated programs can be realized with the help of the auxiliary storage to stack the restart instructions.

Sometimes we will generalize U to have access to an “oracle” A, i.e. an infinite look-up table which the machine can consult in the course of its computation. The oracle may be thought of as an arbitrary 0/1-valued function A(x) which the machine can cause to be evaluated by writing the argument x on a special tape and entering a special state of the finite control unit. In the next machine cycle the oracle responds by sending back the value A(x). The time required to evaluate the function is thus linear in the length of its argument. In particular we consider the case in which the information in the oracle is random, each location of the look-up table having been filled by an independent coin toss. Such a random oracle is a function whose values are reproducible, but otherwise unpredictable and uncorrelated.

Let {φAi (p, w): i = 0,1,2…} be an acceptable Gödel numbering of A-partial recursive functions of two arguments and {φAi (p, w)} an associated Blum complexity measure, henceforth referred to as time. An index j is called self-delimiting iff, for all oracles A and all values w of the second argument, the set { x : φAj (x, w) is defined} is a prefix set. A self-delimiting index has efficient concatenation if there exists a string r such that for all oracles A and all strings w, x, y, p, and q,if φAj (p, w) = x and φAj (q, x) = y, then φAj(rpq, w) = y and φAj (rpq, w) = φAj (p, w) + φAj (q, x) + O(1). A self-delimiting index u with efficient concatenation is called efficiently universal iff, for every self-delimiting index j with efficient concatenation, there exists a simulation program s and a linear polynomial L such that for all oracles A and all strings p and w, and

φAu(sp, w) = φAj (p, w)

and

ΦAu(sp, w) ≤ L(ΦAj (p, w))

The functions UA(p,w) and TA(p,w) are defined respectively as φAu(p, w) and ΦAu(p, w), where u is an efficiently universal index.

For any string x, the minimal program, denoted x∗, is min{p : U(p) = x}, the least self-delimiting program to compute x. For any two strings x and w, the minimal program of x relative to w, denoted (x/w)∗, is defined similarly as min{p : U(p,w) = x}.

By contrast to its minimal program, any string x also has a print program, of length |x| + O(log|x|), which simply transcribes the string x from a verbatim description of x contained within the program. The print program is logarithmically longer than x because, being self-delimiting, it must indicate the length as well as the contents of x. Because it makes no effort to exploit redundancies to achieve efficient coding, the print program can be made to run quickly (e.g. linear time in |x|, in the present formalism). Extra information w may help, but cannot significantly hinder, the computation of x, since a finite subprogram would suffice to tell U to simply erase w before proceeding. Therefore, a relative minimal program (x/w)∗ may be much shorter than the corresponding absolute minimal program x∗, but can only be longer by O(1), independent of x and w.

A string is compressible by s bits if its minimal program is shorter by at least s bits than the string itself, i.e. if |x∗| ≤ |x| − s. Similarly, a string x is said to be compressible by s bits relative to a string w if |(x/w)∗| ≤ |x| − s. Regardless of how compressible a string x may be, its minimal program x∗ is compressible by at most an additive constant depending on the universal computer but independent of x. [If (x∗)∗ were much smaller than x∗, then the role of x∗ as minimal program for x would be undercut by a program of the form “execute the result of executing (x∗)∗.”] Similarly, a relative minimal program (x/w)∗ is compressible relative to w by at most a constant number of bits independent of x or w.

The algorithmic probability of a string x, denoted P(x), is defined as {∑2−|p| : U(p) = x}. This is the probability that the U machine, with a random program chosen by coin tossing and an initially blank data tape, will halt with output x. The time-bounded algorithmic probability, Pt(x), is defined similarly, except that the sum is taken only over programs which halt within time t. We use P(x/w) and Pt(x/w) to denote the analogous algorithmic probabilities of one string x relative to another w, i.e. for computations that begin with w on the data tape and halt with x on the data tape.

The algorithmic entropy H(x) is defined as the least integer greater than −log2P(x), and the conditional entropy H(x/w) is defined similarly as the least integer greater than − log2P(x/w). Among the most important properties of the algorithmic entropy is its equality, to within O(1), with the size of the minimal program:

∃c∀x∀wH(x/w) ≤ |(x/w)∗| ≤ H(x/w) + c

The first part of the relation, viz. that algorithmic entropy should be no greater than minimal program size, is obvious, because of the minimal program’s own contribution to the algorithmic probability. The second half of the relation is less obvious. The approximate equality of algorithmic entropy and minimal program size means that there are few near-minimal programs for any given input/output pair (x/w), and that every string gets an O(1) fraction of its algorithmic probability from its minimal program.

Finite strings, such as minimal programs, which are incompressible or nearly so are called algorithmically random. The definition of randomness for finite strings is necessarily a little vague because of the ±O(1) machine-dependence of H(x) and, in the case of strings other than self-delimiting programs, because of the question of how to count the information encoded in the string’s length, as opposed to its bit sequence. Roughly speaking, an n-bit self-delimiting program is considered random (and therefore not ad-hoc as a hypothesis) iff its information content is about n bits, i.e. iff it is incompressible; while an externally delimited n-bit string is considered random iff its information content is about n + H(n) bits, enough to specify both its length and its contents.

For infinite binary sequences (which may be viewed also as real numbers in the unit interval, or as characteristic sequences of sets of natural numbers) randomness can be defined sharply: a sequence X is incompressible, or algorithmically random, if there is an O(1) bound to the compressibility of its initial segments Xn. Intuitively, an infinite sequence is random if it is typical in every way of sequences that might be produced by tossing a fair coin; in other words, if it belongs to no informally definable set of measure zero. Algorithmically random sequences constitute a larger class, including sequences such as Ω which can be specified by ineffective definitions.

The busy beaver function B(n) is the greatest number computable by a self-delimiting program of n bits or fewer. The halting set K is {x : φx(x) converges}. This is the standard representation of the halting problem.

The self-delimiting halting set K0 is the (prefix) set of all self-delimiting programs for the U machine that halt: {p : U(p) converges}.

K and K0 are readily computed from one another (e.g. by regarding the self-delimiting programs as a subset of ordinary programs, the first 2n bits of K0 can be recovered from the first 2n+O(1) bits of K; by encoding each n-bit ordinary program as a self-delimiting program of length n + O(log n), the first 2n bits of K can be recovered from the first 2n+O(log n) bits of K0.)

The halting probability Ω is defined as {2−|p| : U(p) converges}, the probability that the U machine would halt on an infinite input supplied by coin tossing. Ω is thus a real number between 0 and 1.

The first 2n bits of K0 can be computed from the first n bits of Ω, by enumerating halting programs until enough have halted to account for all but 2−n of the total halting probability. The time required for this decoding (between B(n − O(1)) and B(n + H(n) + O(1)) grows faster than any computable function of n. Although K0 is only slowly computable from Ω, the first n bits of Ω can be rapidly computed from the first 2n+H(n)+O(1) bits of K0, by asking about the halting of programs of the form “enumerate halting programs until (if ever) their cumulative weight exceeds q, then halt”, where q is an n-bit rational number…