With Hadoop approaching its 10th anniversary, the good people at @IBMbigdata organized an open twitter chat to explore how it has shaped up so far and discover where it’s heading.
Special guests for the chat were Mike Gualtieri (@mgualtieri), principal analyst covering big data strategy and Hadoop at Forrester; Jeff Kelly (@jeffreyfkelly), principal research contributor at The Wikibon Project and a contributing editor at SiliconANGLE; and James Kobielus (@jameskobielus), big data evangelist and senior program director at IBM. Throughout the discussion, 9 questions were asked by the moderator.
1. Nearly 10 years after its creation, how mature is Hadoop in terms of adoption, functionality, etc?
Mike: Hadoop is ready for adoption, but most firms still doing PoCs. Production momentum is building rapidly though. There is a sweetspot for Hadoop in every large enterprise. Rapid innovation in ecosystem is making the case stronger. Fast, compliant SQL on Hadoop will allow enterprises to use existing analytical tools without modification. Non-SQL use cases on Hadoop are big too – advanced analytics, data lake, etc..
Jeff: Still very early days for Hadoop – most deployments, even among early adopters, still in PoC/pilot phase. Hadoop is nearing a tipping point among early adopters who are getting ready to move to production. Enterprise-grade capabilities – backup, HA, security – a top concern of Hadoop admin practitioners
James: Hadoop has achieved significant enterprise adoption, hence “mature” in that respect. Functionality-wise, on maturity cusp. In terms of Hadoop functionality, Apache open-source distro is full in-db analytics MPP for big data cognitive computing. In terms of commercial apps, dev tools, mgmt tools, appliances, etc., Hadoop has a rich, vibrant marketplace. IBM et al. In terms of the maturity of enterprise IT skills & practices surrounding Hadoop, many users are short-staffed/bootstrapping
2. Will YARN and Hadoop 2.0 obsolete MapReduce and Hadoop 1.0?
Jeff: No – MapReduce has its role for non-real-time workloads – like all tech, practitioners must use right tool for right job. That said, YARN enables Hadoop to support multiple application types and data processing capabilities, big step forward. YARN huge potential, but still need developers to build apps that take advantage of new capabilities.
James: YARN Hadoop 2.0 not obsoleting MapReduce Hdp 1.0. Evolving it into more flexible model/execution platform for bigdata analytics. For starters, MapReduce & HDFS in Hadoop 1.0 will remain core infra for many bigdata apps involving hybrid architecture. What YARN enables is incorporation of R and other programming languages/paradigms beyond MapReduce into Hadoop projects.
Craig: hadoop 2.X evolves to the “data grid” concept Host data once and re-use many based on business requirements and solution fit.
3. What do you think Hadoop needs most: more features, ease of use, wider adoption, an image makeover, etc.?
Mike: Hadoop needs 1) stronger ecosystem of data management 2) clarity on how fits architecturally in heterogeneous environments.
Jeff: Hadoop practitioners stress enterprise-grade backup, high availability and security as key barrier to production deployment. Application development tools is another key feature needed to spur the creation of apps that help business people solve business problems.
James: Hadoop doesn’t “need” anything. Users do the needing. Their needs are for faster, more efficient, cheaper big data solutions. Big data universe is increasingly hybrid: Hadoop has its core use cases, NoSQL has its, in-memory its, RDBMS its, etc. Within the context of the hybrid big data cloud, Hadoop’s sweet-spot use cases need to emphasized. Not optimal for all. Within today’s innovative big data arena, in-memory, streaming, graph dbs (eg Spark, Streams) hit sweetspots beyond Hadoop
David: There’s still an educational issue for hadoop to show SMBs especially how they can use it. Not just for big boy enterprise.
4. Is Hadoop’s reputation as too complex still accurate, or just a bad rap that refuses to die?
Mike: Hadoop is not too complex for me and most application development professionals can learn easy. Perhaps technology professionals struggle to understand use cases for Hadoop. Download Hadoop and run the “word count” example. Piece of cake if you are a Java developer. Fast, compliant SQL on Hadoop is will make the power of Hadoop more accessible to non-developers.
Jeff: Hadoop is still complex compared to mature DW/RDBMS – but its only been around for 9 years – still just a kid! With each iteration Hadoop gets easier to manage and consume – cloud could also play role in abstracting away complexity. Also beware vendors with an interest in maintaining status quo injecting FUD into the Hadoop complexity conversation.
Craig: hadoop is a mind shift from traditional data management efforts; push back is from the core data warehousing.
Franz: Straightforward, simple use cases by industry probably the best remedy for that.
George: two parallel threads of understanding hadoop: dev and IT. Dev loves the new tools, IT hates the unknowns
5. What’s the application sweet spot for Hadoop vs. NoSQL vs. in-memory databases?
Mike: Use Hadoop to 1) cost effectively break down data silos 2) do advanced analytics on large, gnarly data sets.
James: Hadoop sweetspots: data curation, sandbox/exploration, unstructured analysis, at-rest machine-learning, queryable archive. NoSQL sweetspots: unstructured data analysis, Internet of Things/machine-data analysis, natural language processing. In-memory db sweetspots: fast query, interactive exploration, in-motion analytics, etc.
6. What has the brighter future: the Hadoop appliance market or Hadoop in the cloud?
Mike: I love Hadoop appliance and I love Hadoop in the cloud! Both have great futures. Cloud=elastic; Appliance=performance.
Jeff: In the mid-term (next 5 to 10 years) Hadoop appliance form factor will appeal to many mainstream enterprises but in the long-term (10 to 20 years out) the cloud will play a central role in big data architecture , consumption of insights
James: federated hybrid clouds–on-premises using appliances, off-premises using public/SaaS will be prevalent. Both markets are bright. Marketplace dynamic affecting entire big data Hadoop arena: bigger, faster, cheaper. Clouds enable that & will predominate.
George: appliances promising for real-time and DW2.0 (3.0?), cloud inevitable for Internet of Things and crowd-source data apps
7. Is the Hadoop space ready for true standardization? If so, where?
Jeff: Still competing approaches to Hadoop but most have open core in common, which would be natural area for standardization. it’s starting to happen, but competing vendors have own approaches at different layers of the stack.
James: Hadoop is indeed ready for true standardization: in SQL, metadata, APIs, MapReduce, etc. Hadoop ecosystem needs to standardize on a reference architecture that breaks subprojects/functionality into service layers. Contrary to what some Hadoop players have insisted, open-source process hasn’t delivered ubiquitous multivendor interoperability. It’s stubborn open issue.
8. What will be Hadoop’s principal commercial use case in the year 2020?
Jeff: Hadoop is general purpose, multi-app platform – uses will span verticals by 2020
James: In year 2020, Hadoop will be the predominant unstructured data curation platform. Hadoop core role in unstructured data curation will make it 2020 go-to for cognitive computing in social, mobile, cloud, & IoT
George: Data mgmt platform (file system, data streaming). Most cool dev and analytics tools will live outside hadoop.
9. What was the biggest surprise at Hadoop Summit (2014)?
James: Biggest surprise (on the disappointment side of the ledger) was continued lack of broad Hadoop industry focus on standards. Another big surprise (on the positive side) from Hadoop Summit was that most vendors doing well & innovation remains strong.