Stream Computing is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources. The solution can handle very high data throughput rates, up to millions of events or messages per second.
This presentation is an introduction to InfoSphere Streams. First, we position current market challenges in the area of big data. Then we discuss how context-aware stream computing from IBM InfoSphere Streams addresses these challenges. Finally we present how InfoSphere Streams provides unique value across a range of industries. You can get started now with our InfoSphere Streams Quick Start program and new open source project.
Quick Start: http://www-01.ibm.com/software/data/infosphere/streams/quick-start/
Open Source: https://github.com/IBMStreams
Clients need to move from data management to action based on real-time insight. Speed isn’t just about how fast data is produced or changed, BUT the speed at which data must be received, understood, and processed. This presentation will outline how to harness fast moving data inside and outside of your organization.
Your organization needs to shift from management of data to action. Organizations should:
Select valuable data and insights to be stored for further processing
Process and analyze perishable data to take real-time action
Harness and process streaming data such as video, acoustic, thermal, geospatial or sensors
To date vendors are overly focused on “how to manage big data?” The market demands something different. Clients are asking: “how to make sense of and analyze big data in real time?” There will be 1 trillion connected things by 2015, 3x increase in transistors per human by 2017.
Story to illustrate the problem: For one retail provider, two in every thousand people they are hiring got arrested for stealing at the very same store. In this example, big data is gathered and stored but not understood.
Story to illustrate the problem: The TSA spent $900 million on behavior detection officers who detected 0 terrorists. In this example, time and resources are spent chasing false positives.
Forrester reports “perishable data represents a huge opportunity” and a 66% increase in streaming analytics since 2012
Lost, forgotten, and unused insight is common
Sources
Forrester Wave: http://www.forrester.com/pimages/rws/reprints/document/113442/oid/1-ROXXEJ
TSA Spend: http://cnsnews.com/news/article/michael-w-chapman/tsa-spent-900-million-behavior-detection-officers-who-detected-0
False positives: http://finance.yahoo.com/news/ponemon-report-reveals-high-cost-110000858.html
Time to Analyze Social Data: https://www.youtube.com/watch?v=JHaA-XS5UkI
Machine Data: http://www.m2m-alliance.com/fileadmin/journal/140630_M2M_Journal.pdf
Enterprise Amnesia: https://www.youtube.com/watch?v=52VWaf0XxNY
Connected Things: http://postscapes.com/internet-of-things-market-size
Our mission is to deliver: context-aware stream computing, the next revolution in stream computing. Existing big data technologies need to advance to include context delivered in real-time.
What is context-aware stream computing?
Continuously integrate and analyze data in real-time to understand context of everything from people to machines. Leverage this real-time insight to enhance and create more accurate analytical models and fuel cognitive systems. Detect insights (risks and opportunities) in fast data which can only be detected and acted on at a moment’s notice.
The value driver for big data has shifted from volume to velocity.
Big data's initial impact on organizations came in 2012 as the deluge of data crossed its tipping point. Organizations initially aimed big data investments at managing the often overwhelming amount and types of data suddenly available. In our 2012 analytics study, "Analytics: The real-world use of big data," we identified the characteristics differentiating organizations most was a scalable and extensible infrastructure. But just managing the volume and variety of data is no longer enough to outperform competitors.
Organizations using big data technologies broadly throughout their business functions -- capabilities that enable business functions to consume the data rather than just absorb it -- are creating the greatest impacts on business performance. Now we find the components most differentiating organizations creating the most value from data and analytics are those capable of creating an agile and flexible infrastructure, one designed to manage data efficiently and move it through the analytics process quickly.
Its not cost effective to store all data, especially if its low or yet to be deemed of value (noise)
But it is highly valuable to inspect / analyze all the data, to identify the signal from the noise or determine what needs to be persisted
There is value in identifying the signal in the past, offline analysis is actually required, but you’ve now lost the chance to effect the now
As discussed, business imperatives require a real-time response/action based on analyzing all available data continuously. This is challenging especially given that many data sources such as GPS data are constantly changing and are very bursty. To meet requirements four common tactics are often deployed, but they fall short.
Let’s take an example of a telecommunications provider to understand why these tactics fall short. Telecommunications providers need to improve network quality, prevent dropped calls and improve client satisfaction in real time. However, it isn’t always cost-effective or practical to store and then analyze all enterprise data in a data warehouse or Hadoop system. Let’s walk through an example for telecommunications to understand how each technique falls short.
Telecommunications providers need to:
Harness and process streaming data sources such as geospatial position and network devices
Continuously analyze and connect different silos of information such as client payment history, geospatial position and network health
Select valuable data and insights to be stored for further processing
Quickly process and analyze perishable data, and take timely action
Each approach outlined on the chart handles a part of the challenge, but not all requirements are addressed.
Business rules use logic, if, then, else scenarios; but deep analytics are required
Analytic silos provide limited value, its more interesting to understand each analytic in context of the others, for example does usage history relate to payment history?
Real-time analytic solutions in house are expensive and less sophisticated, most organizations don’t have statisticians in house, what about analyzing video, images or sound? Does your organization have this expertise?
Expanding the data warehouse means throwing more data at the problem, without context this isn’t helpful. Also, can you afford the time it takes to govern a wide variety of data types?
IBM Context-Aware Stream Computing helps organizations optimize decisions and implement repeatable business outcomes across all processes, applications and interactions in the business moment. IBM Context-Aware Stream Computing integrates and analyzes all data (situational, environmental, machine, structured, streaming and more) to anticipate immediate needs and proactively offer enriched, situation-aware content, functions, and experiences to decision management systems (case management, business process management) to trigger real-time action in the business movement. It delivers high quality insights to reflect opportunity (e.g., personalize customer offerings) or risk (e.g., customers on sanction lists) to trigger the right action, all the time. It enables the discovery of new, more accurate, predictive models and more intelligent business operations.
It does this by:
Sensing every data point and event to capture what is happening (inside and outside the fire-wall)
Putting data and events into context to understand and evaluate how everything relates
Applying real-time analytics to gain best possible insight to decide what is best
Putting that decision into action where it is needed the most – in processes, applications and interactions
The result? The right action in real-time – all the time.
Competitors may claim to provide context-aware stream computing, but IBM’s top four differentiators make IBM the leader. It’s not just IBM saying this, the next slides show reactions from analysts and also benchmarks to try for yourself.
Instantaneous responses are required for stock trading, national security or for disease detection. But is it is important to realize that a fast response without power analytics to back it up is worthless. The way to address the challenge is to continuously perform analytics on data streams all the time. Use statistical models on data in motion that is constantly changing respond immediately. This compliments existing data at rest analytic solutions.
Read more here: https://www.ibmdw.net/streamsdev/2014/04/22/streams-apache-storm/ (Short URL - http://tinyurl.com/kzmhhhj)
For details on faster development environment: http://www.rosebt.com/uploads/8/1/8/1/8181762/infosphere_streams_v2.0.0.3_overview.pdf
Storm is not a viable solution for many situations, the TCO of deploying immature and poorly developed systems can be devastating for clients.
See analyst report for details: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=XB&infotype=PM&appname=SWGE_IM_IM_USEN&htmlfid=IME14024USEN&attachment=IME14024USEN.PDF
EMA report showcasing InfoSphere Streams as a market leader - http://public.dhe.ibm.com/common/ssi/ecm/en/iml14403usen/IML14403USEN.PDF
InfoSphere Streams outperforms Apache Storm by 2.6 to 12.3 times in terms of throughput while simultaneously consuming 5.5 to 14.2 times less CPU time. Furthermore, the throughput and CPU time gaps widen as data volume, degree of parallelism, and/or number of processing nodes grows.
This benchmark study clearly finds that the InfoSphere Streams architecture for streaming analysis is fundamentally superior to Apache Storm. InfoSphere Streams handles heavy load much better (i.e. it can make more effective use of available CPU capacity). The noticeable performance degradation with Apache Storm on meaningful workloads (typical of streaming analysis) means that the cost of application logic is very high. As a result, Apache Storm is unusable for most production applications such as geospatial analytics, deep network inspection and call data record analysis. The sophisticated and robust engineering of InfoSphere Streams ensures the ability to scale linearly and handle high loads effectively while maintaining a low resource usage footprint. The ability to scale in a near linear way and to efficiently handle high workloads with minimal performance degradation emerged from this benchmark study as the obvious differentiator for InfoSphere Streams.
How to Address the Details of the Software AG Score
IBM has a fundamentally different approach. Rather than create a silo solution, we integrate with an ecosystem of offerings such as business process management and complex event processing. For example, we don’t want InfoSphere Streams to become a visualization platform. Watson Explorer can be used for visualization as well as any other platform of choice. IBM is open and integrated, not closed and operating in a silo. InfoSphere Streams is a context-aware stream computing solution, that is the goal and our strength.
IBM lost points because: No native windows support
Response: Linux support is the top priority for our clients
IBM lost points because: We don’t offer an exclusive design for public cloud
Response: Streams is available for cloud with monthly licenses, but not exclusively for cloud. We support public, private and hybrid clouds.
IBM lost points because: No extensive business applications
Response: IBM builds mostly custom applications, ISVs manage the packaged apps
IBM lost points because: No business rules engine included in the package
Response: We integrate with WebSphere ODM and others, but have chosen not to bundle
IBM lost points because: No predictive analytics modeling tool
Response: We integrate with IBM SPSS, R and others via PMML, but have chosen not to bundle
IBM lost points because: No out of the box dashboard tools for visualization
Response: We integrate with Cognos, Cognos RTP, SAP Business Objects and DataWatch, but have chosen not to bundle
IBM lost points because: No Business Process Management platform
Response: Our solution wasn’t designed to be a BPM solution, this was a choice. IBM offers separately, and we have chosen not to bundle. Our offering focuses on real-time analytics.
This slide has been approved by Forrester. You may show the Forrester Wave graphic to clients as long as the disclaimers are there and you make NO CHANGES to this slide. If you use this slide inside another presentation, you must keep approval from Forrester. For help with this, please contact Kimberly Madia kmadia@us.ibm.com 1-720-396-5281
In a survey of more than 1,000 big data developers, analyst firm Evans Data Corporation found that IBM is the leading provider of Hadoop among developers, with more than 25 percent of respondents identifying IBM's Hadoop as their principle distribution. The survey also focused on key growth areas such as machine learning and streaming analytics, where 20 percent of developers cited IBM InfoSphere Streams as their preferred platform for stream processing, making it the most popular choice in the category.
The state of big data is changing. “Hadoop” now refers to a ecosystem of offerings, with streaming analytics among the most critical components. Why? Because clients need to move faster.
InfoSphere Streams is well ahead of open source. Here are the top 10 reasons we want you to share with clients.
1. The market leader, See Forrester Report and EMA Analyst Report
http://w3-03.ibm.com/software/spcn/content/P370074U93659Q69.html
http://w3-103.ibm.com/software/spcn/content/P998164W72039L82.html
2. Superior performance and scalability; See Benchmark Streams vs. Storm
http://w3-103.ibm.com/software/spcn/content/B733880N34173G93.html
3. IBM focuses on productivity for the developer and deep analytics for the business, enabling faster time market and greater value. Streams includes the Streams Studio IDE that provides many productivity features including drag and drop editor and multiple wizards to guide you through the development tasks. It also includes 100’s of operators and specialized functions to speed up development.
4. Ease of operation: Streams includes graphical tooling for installation and administration such as the instances manager and the Streams console. The operators provided include metrics that can be monitored to detect any issues with the application. Overall, Streams provides a complete view of the cluster for monitoring and managing applications.
5. Low risk: Streams is an IBM supported product. Thriving Communities, StreamsDev, Streams on GitHub
https://developer.ibm.com/streamsdev/
https://github.com/IBMStreams
6. Integration: Streams integrates with multiple IBM products such as BigInsights, SPSS Modeler, Cognos, Operational Decision Manager, Watson explorer, DB2, Informix, and Netezza just to name a few. It also integrates with other commercial and open-source products such as Oracle and Apache Active MQ .
7. IBM has a superior business model. Flexible options to meet client needs, Quick Start, Developer Edition, Product Edition, and Cloud. Flexible pricing monthly and perpetual.
8. Enterprise tested, 20+ case studies and videos
9. Optimized workloads, purpose built streaming engine
10. Analytics – natural language processing, geospatial, time series and more.
Clients need to move from managing data to making sense of big data. InfoSphere Streams brings speed, analytics and context to data to drive faster, more accurate decisions in the business moment. Plus is bundled with IBM InfoSphere BigInsights, the enterprise grade Hadoop offering from IBM.
IBM is a dominate market player with the most implementation experience across the big data ecosystem.
Now clients can make sense of big data in the business moment. The benefits include:
Real-time actionable insight: Enable higher quality decisions faster. Determine the next best action based on up-to-the-second observations, while the event/transaction is still happening e.g., the perfect web page advertisement
Better focused human attention: Make sure the top priorities are truly the most valuable every moment of the day
Detect new/emerging patterns: Enhance the accuracy of analytical and cognitive systems. Draw on richness of real-time data in context to ensure analytical and cognitive systems are able to discover new and emerging patterns, previously unforeseeable
These are just a sampling of the many client examples of InfoSphere Streams in action, we have more references, videos and case studies later in the presentation.
External link to references: http://www-01.ibm.com/software/data/infosphere/streams/resources.html
Internal link to CRDB: https://w3-03.sso.ibm.com/sales/support/apilite.wss?appname=crmd&mostrecentsort=yes&crv=no&additional=summary&alldocs=TRUE&cras_software=%22InfoSphere%20Streams%22&infotype=CR&others=RFCS%20RFVI%20RFWN
There are many resources for additional reading. Explore both business and technical resources. All resources publically accessible.