Tuesday, 18 June 2013
The tech industry has been buzzing about “big data” for years now. And according to venture capital firm Accel Partners, the excitement around the big data space is not set to die down any time soon — it’s just about to enter into a new phase.
Accel is announcing tonight that it has closed on $100 million for a new investment fund called Big Data Fund 2. The fund is the same size as Accel’s first big data focused fund, which launched with $100 million back in November 2011.
As part of the new fund, Accel is also adding QlikView CTO Anthony Deighton and Imperva CEOShlomo Kramer to its Big Data Fund Advisory Council, which Accel has said is meant to serve as a “guiding light” to help think through investments and track entrepreneurs doing interesting things in the space.
Despite the nearly identical name, Accel’s Big Data Fund 2 will mark a definite shift in focus from the firm’s first big data fund, partner Jake Flomenberg said in a phone call today. “Over the past few years, we’ve focused a tremendous amount of attention on what people like to call the ‘three Vs’ of big data: variety, volume and velocity,” he said. “We now believe there is a fourth V, which is end user value, and that hasn’t been addressed to the same extent,” and that is where Big Data Fund 2 will be focusing the bulk of its investment and attention.
Specifically, Accel believes that “last mile” for big data will be served largely by startups focused on data-driven software, or “DDS.” These startups have largely been made possible through the hardware and infrastructure technology innovations that defined big data’s first wave, Flomenberg says. In a prepared statement from Accel, Facebook engineering VP Jay Parikh, who also serves on Accel’s Big Data Advisory Council, explained it like this:
“The last mile of big data will be built by a new class of software applications that enable everyday users to get real value out of all the data being created. Today’s entrepreneurs are now able to innovate on top of a technology stack that has grown increasingly powerful in the last few years – enabling product and analytical experiences that are more personalized and more valuable than ever.”
One example Flombenberg pointed to as an example of a “fourth V” DDS startup is RelateIQ, the “next generation relationship manager” software startup which launched out of stealth last week with some $29 million in funding from Accel and others.
Accel’s existing portfolio of big data investments also includes Cloudera, Couchbase, Lookout, Nimble Storage, Opower, Prismatic, QlikView, Sumo Logic, and Trifacta.
Sunday, 9 June 2013
Thursday, 6 June 2013
VMware has revealed its VMware vCloud Hybrid Service, an infrastructure as a service (IaaS) platform.
“VMware’s mission is to radically simplify IT and help customers transform their IT operations,” said Pat Gelsinger, CEO of VMware.
“Today, with the introduction of the VMware vCloud Hybrid Service, we take a big step forward by coupling all the value of VMware virtualisation and software-defined data centre technologies with the speed and simplicity of a public cloud service that our customers desire.”
vCloud Hybrid Service will extend VMware software, currently being used by hundreds of thousands of customers, into the public cloud. This means customers will be able to utilise the same skills, tools, networking and security models across both on-premise and off-premise environments.
“As a source of competitive advantage for our international business, our operations and IT department needs the agility and efficiency the public cloud promises,” says Julio Sobral, senior VP of business operations at Fox International.
“However, we don’t have the luxury of starting from scratch; we see in the vCloud Hybrid Service a potential solution to enable Fox International to have a more elastic platform that will support future deployments around the world. Working with technology partners like VMware gives us the best of both worlds by extending our existing infrastructure to realise the benefits of public cloud.”
According to the company, the vCloud Hybrid Service will allow customers to extend their data centres to the cloud and will support thousands of applications and more than 90 operating systems that are certified to run on vSphere. This means customers can get the same level of availability and performance running in the public cloud, without changing or rewriting their applications.
Built on vSphere, vCloud Hybrid Service offers automated replication, monitoring and high availability for business-critical applications, leveraging the advanced features of vSphere, including VMware vMotion, High Availability and vSphere Distributed Resources Scheduler.
“Our new VMware vCloud Hybrid Service delivers a public cloud that is completely interoperable with existing VMware virtualised infrastructure,” said Chris Norton, regional director at VMware for southern Africa.
“By taking an ‘inside-out’ approach that will enable new and existing applications to run anywhere, this service will bridge the private and public cloud worlds without compromise.”
According to VMware, the vCloud Hybrid Service will be available this month through an early access programme.
Monday, 3 June 2013
Hadoop is the poster child for Big Data, so much so that the open source data platform has become practically synonymous with the wildly popular term for storing and analyzing huge sets of information.
While Hadoop is not the only Big Data game in town, the software has had a remarkable impact. But exactly why has Hadoop been such a major force in Big Data? What makes this software so damn special - and so important?
Sometimes the reasons behind something success can be staring you right in the face. For Hadoop, the biggest motivator in the market is simple: Before Hadoop, data storage was expensive.
Hadoop, however, lets you store as much data as you want in whatever form you need, simply by adding more servers to a Hadoop cluster. Each new server (which can be commodity x86 machines with relatively small price tags) adds more storage and more processing power to the overall cluster. This makes data storage with Hadoop far less costly than prior methods of data storage.
(See also Hadoop: What It Is And How It Works.)
Spendy Storage Created The Need For Hadoop
We're not talking about data storage in terms of archiving… that's just putting data onto tape. Companies need to store increasingly large amounts of data and be able to easily get to it for a wide variety of purposes. That kind of data storage was, in the days before Hadoop, pricey.
And, oh what data there is to store. Enterprises and smaller businesses are trying to track a slew of data sets: emails, search results, sales data, inventory data, customer data, click-throughs on websites… all of this and more is coming in faster than ever before, and trying to manage it all in a relational database management system (RDBMS) is a very expensive proposition.
Historically, organizations trying to manage costs would sample that data down to a smaller subset. This down-sampled data would automatically carry certain assumptions, number one being that some data is more important than other data. For example, a company depending on e-commerce data might prioritize its data on the (reasonable) assumption that credit card data is more important than product data, which in turn would be more important than click-through data.
Assumptions Can Change
That's fine if your business is based on a single set of assumptions. But what what happens if the assumptions change? Any new business scenarios would have to use the down-sampled data still in storage, the data retained based on the original assumptions. The raw data would be long gone, because it was too expensive to keep around. That's why it was down-sampled in the first place.
Expensive RDBMS-based storage also led to data being siloed within an organization. Sales had its data, marketing had its data, accounting had its own data and so on. Worse, each department may have down-sampled its data based on its own assumptions. That can make it very difficult (and misleading) to use the data for company-wide decisions.
Hadoop: Breaking Down The Silos
Hadoop's storage method uses a distributed filesystem that maps data wherever it sits in a cluster on Hadoop servers. The tools to process that data are also distributed, often located on the same servers where the data is housed, which makes for faster data processing.
Hadoop, then, allows companies to store data much more cheaply. How much more cheaply? In 2012, Rainstor estimated that running a 75-node, 300TB Hadoop cluster would cost $1.05 million over three years. In 2008, Oracle sold a database with a little over half the storage (168TB) for $2.33 million - and that's not including operating costs. Throw in the salary of an Oracle admin at around $95,000 per year, and you're talking an operational cost of $2.62 million over three years - 2.5 times the cost, for just over half of the storage capacity.
This kind of price savings mean Hadoop lets companies afford to hold all of their data, not just the down-sampled portions. Fixed assumptions don't need to be made in advance. All data becomes equal and equally available, so business scenarios can be run with raw data at any time as needed, without limitation or assumption. This is a very big deal, because if no data needs to be thrown away, any data model a company might want to try becomes fair game.
That scenario is the next step in Hadoop use, explained Doug Cutting, Chief Architect ofCloudera and an early Hadoop pioneer. "Now businesses can add more data sets to their collection," Cutting said. "They can break down the silos in their organization."
More Hadoop Benefits
Hadoop also lets companies store data as it comes in - structured or unstructured - so you don't have to spend money and time configuring data for relational databases and their rigid tables. Since Hadoop can scale so easily, it can also be the perfect platform to catch all the data coming from multiple sources at once.
Hadoop's most touted benefit is its ability to store data much more cheaply than can be done with RDBMS software. But that's only the first part of the story. The capability to catch and hold so much data so cheaply means businesses can use all of their data to make more informed decisions.