# Random Uniform Deviate, or Correlation Dimension

A widely used dimension algorithm in data analysis is the correlation dimension. Fix m, a positive integer, and r, a positive real number. Given a time-series of data u(1), u(2), …, u(N),from measurements equally spaced in time, form a sequence of vectors x(1), x(2),…, x(N- m + 1) in R’, defined by x(i) = [u(i), u(i+ 1),…,u(i+ m – 1)]. Next, define for each i, 1 ≤ i ≤ N – m + 1,

Cmi (r)= (number of j such that d[x(i), x(j)] ≤ r)/(N-m+1) ———- [1]

We must define d[x(i), x(j)] for vectors x(i) and x(j). We define

d[x(i), x(j)]= maxk = 1,2,…,m (|u(i+k-1) – u(j+k-1)j) ———- [2]

From the Cmi (r), define

Cm(r) = (N- m + i)-1 ∑(N – m + 1)i = 1 Cmi (r) ———- [3]

and define

βm = limr → 0 limn → ∞ log Cm(r)/log r ———- [4]

The assertion is that for m sufficiently large, βmis the correlation dimension. Such a limiting slope has been shown to exist for the commonly studied chaotic attractors. This procedure has frequently been applied to experimental data; investigators seek a “scaling range”of r values for which log Cm(r)/log r is nearly constant for large m, and they infer that this ratio is the correlation dimension. In some instances, investigators have concluded that this procedure establishes deterministic chaos.

The later conclusion is not necessarily correct: a converged, finite correlation dimension value does not guarantee that the defining process is deterministic. Consider the following stochastic process. Fix 0 ≤ p ≤1. Define Xj = α-l/2 sin(2πj/12) ∀ j,where α is specified below. Define Yj as a family of independent identicaly distributed (i.i.d.) real random variables, with uniform density on the interval [-√3, √3]. Define Zj = 1 with probability p, Zj = 0 with probability 1 – p.

α = (∑j = 112 sin2(2πj/12)/12 ———- [5]

and define MI Xj = (1- Zj) Xj + ZjYj. Intuitively, MI X(p) is generated by first ascertaining, for each j, whether the jth sample will be from the deterministic sine wave or from the random uniform deviate, with likelihood (1- p) of the former choice, then calculating either Xj or Yj. Increasing p marks a tendency towards greater system randomness. We now show that almost surely βmin [4] equals 0 ∀ m for the MI X(p) process, p ≠ 1. Fix m, define Kj = (12m)j- 12m, and define Nj = 1 if (MI Xk(j)+l,…, k(j)+m) = (X1,. . ., Xm), Nj = 0 otherwise. The Nj are i.i.d.random variables, with the expected value of Nj, E(Nj) ≥ (1- p)m. By the Strong Law of Large Numbers,

limN → ∞ ∑j = 1N Nj/N = E(N1) ≥ (1-p)m

Observe that (∑j = 1N Nj/12 mN)2 is a lower bound to Cm(r), since xk(i)+1,…., xk(j)+1 if Ni = Nj = 1. Thus for r ‹ 1

limN → ∞ sup log Cm(r)/log r ≤ (1/log r) limN → ∞ (∑j = 1N Nj/12 mN)2 ≤ log (1-p)2m/(12m)2/log r

Since, (1-p)2m/(12m)2 is independent of r, βm = limr → 0 limN → ∞ log Cm(r)/log r = 0. Since, βm ≠ 0 with probability 0 for each m, by countable additivity, ∀m, β= 0.

The MIX(p) process can be motivated by considering an autonomous unit that produces sinusoidal output, surrounded by a world of interacting processes that in ensemble produces output that resembles noise relative to the timing of the unit. The extent to which the surrounding world interacts with the unit could be controlled by a gateway between the two, with a larger gateway admitting greater apparent noise to compete with the sinusoidal signal. It is easy to show that, given a sequence Xj, a sequence of k = 1, 2,…, m i.i.d.Yj, defined by a density function and independent of the Xj, and Z= X+ Yj, then Zj has an infinite correlation dimension. It appears that correlation dimension distinguishes between correlated and uncorrelated successive iterates, with larger estimates of dimension corresponding to uncorrelated data. For a more complete interpretation of correlation dimension results, stochastic processes with correlated increments should be analyzed. Error estimates in dimension calculations are commonly seen. In statistics, one presumes a specified underlying stochastic distribution to estimate misclassification probabilities. Without knowing the form of a distribution, or if the system is deterministic or stochastic, one must be suspicious of error estimates. There often appears to be a desire to establish a noninteger dimension value, to give a fractal and chaotic interpretation to the result, but again, prior to a thorough study of the relationship between the geometric Hausdorff dimension and the time series formula labeled correlation dimension, it is speculation to draw conclusions from a noninteger correlation dimension value.

# Data Governance, FinTech, #Blockchain and Audits (Upcoming Bangalore Talk)

This is skeletal and I am febrile, and absolutely nowhere near being punctilious. The idea is to note if this economic/financial revolution, (could it even be called that?) could politically be an overtone window? So, let this be otiose and information disseminating, for a paper is on its way forcing down greater attention to detail and vastly different from here.

Data Governance and Audit Trail

Data Governance specifies the framework for decision rights and accountabilities encouraging desirable behavior in data usage

Main aim of Data Governance is to ensure that data asset are overseen in a cohesive and consistent enterprise-wide manner

Why is there a need for Data governance?

Evolving regulatory mechanisms and requirements

Could integrity of data be trusted?

Centralized versus decentralized documentation as regards use, hermeneutics and meaning of data

Multiplicity of data silos with exponentially rising data

Architecture

Information Owner: approving power towards internal + external data transfers + business plans prioritizing data integrity and data governance

Data steward: create/maintain/define data access, data mapping and data aggregation rules

Application steward: maintain application inventory, validating testing of outbound data and assist master data management

Analytics steward: maintain a solutions inventory, reduce redundant solutions, define rules for use of standard definitions and report documentation guidelines, and define data release processes and guidelines

What could an audit be?

It starts as a comprehensive and effective program encompassing people, processes, policies, controls, and technology. Additionally, it involves educating key stakeholders about the benefits and risks associated with poor data quality, integrity and security.

What should be audit invested with?

Apart from IT knowledge and operational aspects of the organization, PR skills, dealing with data-related risks and managing a push-back or a cultural drift handling skills are sine qua non. As we continue to operate in one of the toughest and most uneven economic climates in modern times, the relevance of the role of auditors in the financial markets is more important than ever before. While the profession has long recognized the impact of data analysis on enhancing the quality and relevance of the audit, mainstream use of this technique has been hampered due to a lack of efficient technology solutions, problems with data capture and concerns about privacy. However, recent technology advancements in big data and analytics are providing an opportunity to rethink the way in which an audit is executed. The transformed audit will expand beyond sample-based testing to include analysis of entire populations of audit-relevant data (transaction activity and master data from key business processes), using intelligent analytics to deliver a higher quality of audit evidence and more relevant business insights. Big data and analytics are enabling auditors to better identify financial reporting, fraud and operational business risks and tailor their approach to deliver a more relevant audit. While we are making significant progress and are beginning to see the benefits of big data and analytics in the audit, this is only part of a journey. What we really want is to have intelligent audit appliances that reside within companies’ data centers and stream the results of our proprietary analytics to audit teams. But the technology to accomplish this vision is still in its infancy and, in the interim, what is transpiring is delivering audit analytics by processing large client data sets within a set and systemic environment, integrating analytics into audit approach and getting companies comfortable with the future of audit. The transition to this future won’t happen overnight. It’s a massive leap to go from traditional audit approaches to one that fully integrates big data and analytics in a seamless manner.

Three key areas the audit committee and finance leadership should be thinking about now when it comes to big data and analytics:

External audit: develop a better understanding of how analytics is being used in the audit today. Since data capture is a key barrier, determine the scope of data currently being captured, and the steps being taken by the company’s IT function and its auditor to streamline data capture.

Compliance and risk management: understand how internal audit and compliance functions are using big data and analytics today, and management’s future plans. These techniques can have a significant impact on identifying key risks and automating the monitoring processes.

Competency development: the success of any investments in big data and analytics will be determined by the human element. Focus should not be limited to developing technical competencies, but should extend to creating the analytical mindset within the finance, risk and compliance functions to consume the analytics produced effectively.

What is the India Stack?

A paperless and cashless delivery system; a paradigm that is intended to handle massive data inflows enabling entrepreneurs, citizens and government to interact with each other transparently; an open system to verify businesses, people and services.

This is an open API policy that was conceived in 2012 to build upon Aadhaar. The word open in the policy signifies that other application could access data. It is here that the affair starts getting a bit murky, as India Stack gives the data to the concerned individual and lets him/her decide who the data can be shared with.

So, is this a Fintech? Fintech is usually applies to the segment of technology startup scene that is disrupting sectors such as mobile payments, money transfers, loans, fundraising and even asset management. And what is the guarantee that Fintech would help prevent fraud that traditional banking couldn’t? No technology can completely eradicate fraud and human deceit, but I believe technology can make operations more transparent and systems more accountable. To illustrate this point, let’s look back at the mortgage crisis of 2008.

Traditional banks make loans the old fashioned way: they take money from people at certain rates (savings deposits) and lend it out the community at a higher rate. The margin constitutes the bank’s profit. As the bank’s assets grow, so do their loans, enabling them to grow organically.

Large investment banks bundle assets into securities that they can sell on open markets all over the world. Investors trust these securities because they are rated by third party agencies such as Moody’s and Standard & Poor’s. Buyers include pension funds, hedge funds, and many other retail investment instruments.

The ratings agencies are paid by investment banks to rate them. Unfortunately, they determine these ratings not so much by the merits of the securities themselves, but according to the stipulations of the banks. If a rating fails to meet the investment banks’ expectations, they can take their business to another rating agency. If a security does not perform as per the rating, the agency has no liability! How insane is that?

Most surprisingly, investment banks can hedge against the performance of these securities (perhaps because they know that the rating is total BS?) through a complex process that I will not get into here.

Investment banks and giant insurance firms such as AIG were the major dominoes that nearly caused the whole financial system to topple in 2008. Today we face an entirely different lending industry, thanks to FinTech. What is FinTech? FinTech refers to a financial services company (not a technology company) that uses superior technology to bring newer and better financial products to consumers. Many of today’s FinTech companies call themselves technology companies or big data companies, but I respectfully disagree. To an outsider, a company is defined by its balance sheet and a FinTech company’s balance sheet will tell you that it makes money from the fees, interest, and service charges on their assets—not by selling or licensing technology. FinTech is good news not only for the investors, borrowers and banks collectively, but also for the financial services industry as a whole because it ensures greater transparency and accountability while removing risk from the entire system. In the past four to five years a number of FinTech companies have gained notoriety for their impact on the industry. I firmly believe that this trend has just begun. FinTech companies are ushering in new digital business models such as auto-decisioning. These models are sweeping through thousands of usual and not-so-usual data sources for KYC and Credit Scoring.

Blockchain can be defined as a peer-to-peer operated public digital ledger that records all transactions executed for a particular asset (…) “The Blockchain maintains this record across a network of computers, and anyone on the network can access the ledger. Blockchain is ‘decentralised’ meaning people on the network maintain the ledger, requiring no central or third party intermediary involvement.” “Users known as ‘miners’ use specialised software to look for these time stamped ‘blocks’, verify their accuracy using a special algorithm, and add the block to the chain. The chain maintains chronological order for all blocks added because of these time-stamps.” The digitalisation of financial service opens room for new opportunity such as to propose new kind of consumer’s experience as well as the use of new technologies and improve business data analysis. The ACPR, the French banking and insurance regulatory authority, has  classified the opportunities and risks linked to the Fintech such as the new services for uses, better resilience versus the difficulty to establish effective supervision, the risks of regulation dumping and regarding clients interest protection such as data misuse and security. The French Central Bank is currently studying blockchain in cooperation with two start-ups, the “Labo Blockchain” and “Blockchain France”. In that context, blockchain is a true financial service disruption, according to Piper Alderman “Blockchain can perform the intermediating function in a cheaper and more secure way, and disrupt the role of Banks.”

Hence, leading bank wants to seize that financial service opportunity. They are currently working on blockchain project with financial innovation firm, R3 CEV. The objective is that the project delivers a “more efficient and cost-effective international settlement network and possibly eliminate the need to rely on central bank”. R3 CEV has announced that 40 peer banks, including HSBC, Citigroup, and BNP Paribas, started an initiative to test new kind of transaction through blockchain. This consortium is the most important ever organized to test this new technology.

And what of security? According to the experts “the design of the blockchain means there is the possibility of malware being injected and permanently hosted with no methods currently available to wipe this data. This could affect ‘cyber hygiene’ as well as the sharing of child sexual abuse images where the blockchain could become a safe haven for hosting such data.” Further, according to the research, “it could also enable crime scenarios in the future such as the deployment of modular malware, a reshaping of the distribution of zero-day attacks, as well as the creation of illegal underground marketplaces dealing in private keys which would allow access to this data.” The issue of cyber-security for financial institutions is very strategic. Firstly, as these institutions rely on customer confidence they are particularly vulnerable to data loss and fraud. Secondly, banks represent a key sector for national security. Thirdly they are exposed to credit crisis given their role to finance economy. Lastly, data protection is a key challenge given financial security legal requirements.

As regard cyber security risks, on of the core legal challenge will be the accountability issue. As Blockchain is grounded on anonymity the question is who would be accountable for the actions pursued? Should it be the users, the Blockchain owner, or software engineer? Regulation will address the issue of blockchain governance. According to Hubert de Vauplane, “the more the Blockchain is open and public, less the Blockchain is governed”, “while in a private Blockchain, the governance is managed by the institution” as regard “access conditions, working, security and legal approval of transactions”. Where as in the public Blockchain, there is no other rules that Blockchain, or in other words “Code is Law” to quote US legal expert Lawrence Lessing. First issue: who is the block chain user? Two situations must be addressed depending if the Blockchain is private or public. Unlike public blockchain, the private blockchain – even though grounded in a public source code – is protected by intellectual property rights in favour of the organism that manages it, but still exposed to cyber security risks. Moreover, a new contractual documentation provided by financial institutions and disclosure duty could be necessary when consumers may simply not understand the information on how their data may be used through this new technology.

‘Disruption’ has turned into a Silicon Valley cliché, something not only welcomed, but often listed as a primary goal. But disruption in the private sector can have remarkably different effects than in the political system. While capital forces may allow for relatively rapid adaptation in the market, complex political institutions can be slower to react. Moreover, while disruption in an economic market can involve the loss of some jobs and the creation of others, disruption in politics can result in political instability, armed conflict, increased refugee flows and humanitarian crises. It nevertheless is the path undertaken….

# Distributed Representation Revisited

If the conventional symbolic model mandates a creation of theory that is sought to address the issues pertaining to the problem, this mandatory theory construction is bypassed in case of distributed representational systems, since the latter is characterized by a large number of interactions occurring in a nonlinear fashion. No such attempts at theoretical construction are to be made in distributed representational systems for fear of high end abstraction, thereby sucking off the nutrient that is the hallmark of the model. Distributed representation is likely to encounter onerous issues if the size of the network inflates, but the issue is addressed through what is commonly known as redundancy technique, whereby, a simultaneous encoding of information generated by numerous interactions take place, thus ameliorating the adequacy of presenting the information to the network. In the words of Paul Cilliers, this is an important point, for,

the network used for the model of a complex system will have to have the same level of complexity as the system itself….However, if the system is truly complex, a network of equal complexity may be the simplest adequate model of such a system, which means that it would be just as difficult to analyze as the system itself.

Following, he also presents a caveat,

This has serious methodological implications for the scientists working with complex systems. A model which reduces the complexity may be easier to implement, and may even provide a number of economical descriptions of the system, but the price paid for this should be considered carefully.

One of the outstanding qualities of distributed representational systems is their adaptability. Adaptability, in the sense of reusing the network to be applicable to other problems to offer solutions. Exactly, what this connotes is, the learning process the network has undergone for a problem ‘A’, could be shared for problem ‘B’, since many of the input neurons are bounded by information learned through ‘A’ that could be applicable to ‘B’. In other words, the weights are the dictators for solving or resolving issues, no matter, when and for which problem the learning took place. There is a slight hitch here, and that being this quality of generalizing solutions could suffer, if the level of abstraction starts to shoot up. This itself could be arrested, if in the initial stages, the right kind of framework is decided upon, thus obscuring the hitch to almost non-affective and non-existence impacting factor. The very notion of weights is considered here by Sterelny as a problematic, and he takes it to attack distributed representation in general and connectionsim as a whole in particular. In an analogically witty paragraph, Sterelny says,

There is no distinction drawable, even in principle, between functional and non- functional connections. A positive linkage between two nodes in a distributed network might mean a constitutive link (eg. Catlike, in a network for tiger); a nomic one (carnivore, in the same network), or a merely associative one (in my case, a particular football team that play in black and orange.

It should be noted that this criticism on weights is derived, since for Sterelny, relationship between distributed representations and the micro-features that compose them is deeply problematic. If such is the criticism, then no doubt, Sterelny still seems to be ensconced within the conventional semantic/symbolic model. And since, all weights can take part in information processing, there is some sort of a democratic liberty that is accorded to the weights within a distributed representation, and hence any talk of constitutive, nomic, or even for that matter associative is mere humbug. Even if there is a disagreement prevailing that a large pattern of weights are not convincing enough for an explanation, as they tend to complicate matters, the distributed representational systems work consistently enough as compared to an alternative system that offers explanation through reasoning, and thereby, it is quite foolhardy to jettison the distributed representation by the sheer force of criticism. If the neural network can be adapted to produce the correct answer for a number of training cases that is large compared with the size of the network, it can be trusted to respond correctly to the previously unseen cases provided they are drawn from the same population using the same distribution as the training cases, thus undermining the commonly held idea that explanations are the necessary feature of the trustworthy systems (Baum and Haussler). Another objection that distributed representation faces is that, if representations are distributed, then the probability of two representations of the same thing as different from one another cannot be ruled out. So, one of them is the true representation, while the other is only an approximation of the representation.(1) This is a criticism of merit and is attributed to Fodor, in his influential book titled Psychosemantics.(2) For, if there is only one representation, Fodor would not shy from saying that this is the yucky solution, folks project believe in. But, since connectionism believes in the plausibility of indeterminate representations, the question of flexibility scores well and high over the conventional semantic/symbolic models, and is it not common sense to encounter flexibility in daily lives? The other response to this objection comes from post-structuralist theories (Baudrillard is quite important here. See the first footnote below). The objection of true representation, and which is a copy of the true representation meets its pharmacy in post-structuralism, where meaning is constituted by synchronic as well as diachronic contextualities, and thereby supplementing the distributed representation with a no-need-for concept and context, as they are inherent in the idea of such a representation itself. Sterelny, still seems to ride on his obstinacy, and in a vitriolic tone poses his demand to know as to why distributed representation should be regarded as states of the system at all. Moreover, he says,

It is not clear that a distributed representation is a representation for the connectionist system at all…given that the influence of node on node is local, given that there is no processor that looks at groups of nodes as a whole, it seems that seeing a distributed representation in a network is just an outsider’s perspective on the system.

This is moving around in circles, if nothing more. Or maybe, he was anticipating what G. F. Marcus would write and echo to some extent in his book The Algebraic Mind. In the words of Marcus,

…I agree with Stemberger(3) that connectionism can make a valuable contribution to cognitive science. The only place, we differ is that, first, he thinks that the contribution will be made by providing a way of eliminating symbols, whereas I think that connectionism will make its greatest contribution by accepting the importance of symbols, seeking ways of supplementing symbolic theories and seeking ways of explaining how symbols could be implemented in the brain. Second, Stemberger feels that symbols may play no role in cognition; I think that they do.

Whatever Sterelny claims, after most of the claims and counter-claims that have been taken into account, the only conclusion for the time being is that distributive representation has been undermined, his adamant position to be notwithstanding.

(1) This notion finds its parallel in Baudrillard’s Simulation. And subsequently, the notion would be invoked in studying the parallel nature. Of special interest is the order of simulacra in the period of post-modernity, where the simulacrum precedes the original, and the distinction between reality and representation vanishes. There is only the simulacrum and the originality becomes a totally meaningless concept.

(2) This book is known for putting folk psychology firmly on the theoretical ground by rejecting any external, holist and existential threat to its position.

(3) Joseph Paul Stemberger is a professor in the Department of Linguistics at The University of British Columbia in Vancouver, British Columbia, Canada, with primary interests in phonology, morphology, and their interactions. My theoretical orientations are towards Optimality Theory, employing our own version of the theory, and towards connectionist models.