Wednesday, February 06, 2013

Baffling Conservative Economics


Dear Hank,

I am baffled by the stupidity of those who call themselves by various names all in the cause of trying to get our economy on track by what they call conservative principles.
These usually call for cuts to government spending.
And then the mantra goes the economy will magically get better.
What I always find in these arguments is that element of magic. 
An unconnected set of actions with a remarkable outcome unsupported by theory, history, fact, or anything else I can find except a magic phrase like: “The will of the people” or “unleashing American productivity.”
Typically, when confronted these folks say something like, “Well I can tell you my life hasn’t gotten better in this economy.” or something like that. So I ask how will cutting government spending help their economic life? What I get is some babble about putting our grandchildren in debt and we can’t go on this way and we have to cut back.
Then I ask, “Well, if we all cut back: you, me, the government, business, everyone - what will happen to our economy?”
I usually get no answer.
I think it is because they really don’t understand the concept of an economy much less what makes it work.
What is clear to me is that when these conservatives get into a position of power they do a bunch of things they never talked about in their campaigns (like attacking women’s rights) and then they lie about what they are doing and have done while in office. It’s the breadth of the lying that I find - unbelievable, and yet it seems to reinforce their base constituents, who never seem to compare facts with what is being said. In other words they never look at reality.

Gotta go, I think someone is here to buy that bridge I was selling on eBay.

Bryce

Monday, February 04, 2013

Hadoop


So this is how fb is doing it:
http://www.wired.com/wiredenterprise/2013/02/facebook-data-team/



http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop/


http://www.wired.com/wiredenterprise/2013/02/facebook-data-team/

http://www.cloudera.com/content/cloudera/en/resources.html

I don't understand any of this yet but I had been wondering for several years how fb was managing all their data (and google and others).

We ran into this kind of problem at Cray with the Unix file system (same for Linux.)
In the unix file system there is a top down inverted tree structure, i.e. Everything starts from the "root" and descends.
As your file system gets larger more and more requests are made on the root (call it the root directory, the root inode, whatever.)
Also, the directories (folders) directly below that directory, get lots of request too. Mostly for reading but occasionally for writing.
Cray got around this problem by duplicating these oft used directories. But you had to make sure most were read requests. If you got a write request you had to freeze all the blocks while they all got written.
So Cray delayed the inevitable problem.

When I was at Sybase we had similar issues with our databases but the idea was to make sure the database stayed consistent and if that meant freezing out some requests so be it, or at least make them wait while you updated everything. Oracle had the same problem, not that their salesmen would know. 

The problem with relational databases like Sybase, Oracle, and mysql was that they required a design to meet a specific need. Naturally, someone would  figure out some query for which the database was not designed and then complain when it was slow. Siebel, which ran on top of Oracle, was a giant data structure and their thing was that they could jam any data you had into it and it would work - so they claimed. It ran slowly, very slowly, and it took up gobs and gobs of hardware. I lost my job at Fannie Mae because of this. (I kept asking for some kind of turn around times and upper management, which hired Siebel people to oversee the project, didn't want to hear it - so they got rid of me. It ran so slowly that they couldn't use it and ended up shelving it.)

Several years ago I heard facebook was using some sort of variant on mysql. I couldn't understand how they could solve the massive data problem that they had to be facing. A relational db (rdbms) would have the problems of any relational database and a filesystem would have the problem that we ran into at Cray.

Standard solutions are: more and faster hardware, duplicate and disperse, put critical stuff in memory (ie and in-core solution) for fast access and off load as much i/o as possible to other cpus.
It seems that fb has gone with an in-core solution and somehow distributed it. They could distribute it by local interest. I would guess most people are interested in people who live near them (like kids going to college, they tend to go to places close to home.)

Hadoop seems to be the solution. What it is I don't know. What I find interesting is that all the big data companies are using it. 
So, even though these companies compete perhaps against one another their problems are so similar that the brains behind what they are doing are common as is their code.