Postingan

Menampilkan postingan dari Oktober, 2013

HBase – Overview of Architecture and Data Model

Gambar
Introduction HBase is a column-oriented database that’s an open-source implementation of Google’s Big Table storage architecture. It can manage structured and semi-structured data and has some built-in features such as scalability, versioning, compression and garbage collection. Since its uses write-ahead logging and distributed configuration, it can provide fault-tolerance and quick recovery from individual server failures. HBase built on top of Hadoop / HDFS and the data stored in HBase can be manipulated using Hadoop’s MapReduce capabilities. Let’s now take a look at how HBase (a column-oriented database) is different from some other data structures and concepts that we are familiar with Row-Oriented vs. Column-Oriented data stores. As shown below, in a row-oriented data store, a row is a unit of data that is read or written together. In a column-oriented data store, the data in a column is stored together and hence quickly retrieved. Row-oriented data stores: Data i...

Tarball installation of CDH4 with Yarn on RHEL 5.7

Gambar
Step: 1 Download the tarball from cloudera Site or simple click   here .  Step: 2 Untar the tarball on anyplace or you ca do in home directory as I did $  tar –xvzf  hadoop-2.0.0-cdh4.1.2.tar.gz Step: 3 Set the different home directory in /etc/profile export JAVA_HOME=/usr/java/jdk1.6.0_22 export PATH=/usr/java/jdk1.6.0_22/bin:"$PATH" export HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.1.2 Step: 4 Create hadoop directory in /etc , Create a softlink in /etc/hadoop/conf  - > $HADOOP_HOME/etc/hadoop $ ln –s  /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop  /etc/hadoop/conf Step : 5 Create different directories listed here For datanode $ mkdir  ~/dfs/dn1   ~/dfs/dn2 $ mkdir   /var/log/hadoop For Namenode $ mkdir  ~/dfs/nn   ~/dfs/nn1 For SecondaryNamenode $  mkdir  ~/dfs/snn for Nodemanager $ mkdir  ~/yarn/local-dir1  ~/yarn/local-dir2 $ mkdir ~/yarn/apps for Mapred...

what are the challenges of cloud computing?

Ensuring adequate performance. The inherent limitations of the Internet apply to cloud computing. These performance limitations can take the form of delays caused by demand and traffic spikes, slow downs caused by malicious traffic/attacks, and last mile performance issues, among others. Ensuring adequate security. Many cloud-based applications involve confidential data and personal information. Therefore, one of the key barriers cloud providers have had to overcome is the perception that cloud-based services are less secure than desktop-based or data center-based services. Ensuring the costs of cloud computing remain competitive. As a leading provider of infrastructure for cloud computing, we are uniquely positioned to help customers overcome the challenges of cloud computing and fully realize its many benefits. Our Intelligent Platform consists of more than 100,000 servers all over the world, running securely and delivering a significant percentage of the world’s cloud comput...

Disadvantages of Cloud Computing

While cloud computing service is a great innovation in the field of computing but still, there are a number of reasons why people not want to adopt cloud computing for their particular need. Dependency One major disadvantages of cloud computing is user’s dependency on the provider. Internet users don’t have their data stored with them. Risk Cloud computing services means taking services from remote servers. There is always insecurity regarding stored documents because users does not have control over their software. Nothing can be recovered if their servers go out of service. Requires a Constant internet connection The most obvious disadvantage is that Cloud computing completely relies on network connections. It makes your business dependent on the reliability of your Internet connection. When it’s offline, you’re offline. If you do not have an Internet connection, you can't access anything, even your own data. A dead internet connection means no work. Similarly, a l...

Cloud Computing – Types of Cloud

Gambar
Cloud computing is usually described in one of two ways. Either based on the cloud location, or on the service that the cloud is offering. Based on a cloud location, we can classify cloud as: public, private, hybrid community cloud Based on a service that the cloud is offering, we are speaking of either: IaaS (Infrastructure-as-a-Service) PaaS (Platform-as-a-Service) SaaS (Software-as-a-Service) or, Storage, Database, Information, Process, Application, Integration, Security, Management, Testing-as-a-service Where Do I Pull the Switch: Cloud Location public cloud  mean that the whole computing infrastructure is located on the premises of a cloud computing company that offers the cloud service. The location remains, thus, separate from the customer and he has no physical control over the infrastructure. As public clouds use shared resources, they do excel mostly in performance, but are also most vulnerable to various attacks. ...

Benifits of Cloud Computing

Gambar
Achieve economies of scale – increase volume output or productivity with fewer people. Your cost per unit, project or product plummets. Reduce spending on technology infrastructure. Maintain easy access to your information with minimal upfront spending. Pay as you go (weekly, quarterly or yearly), based on demand. Globalize your workforce on the cheap. People worldwide can access the cloud, provided they have an Internet connection. Streamline processes. Get more work done in less time with less people. Reduce capital costs. There’s no need to spend big money on hardware, software or licensing fees. Improve accessibility. You have access anytime, anywhere, making your life so much easier! Monitor projects more effectively. Stay within budget and ahead of completion cycle times. Less personnel training is needed. It takes fewer people to do more work on a cloud, with a minimal learning curve on hardware and software issues. Minimize licensing new software...

Introduction to Cloud Computing

Gambar
Cloud computing, or something being in the cloud, is an expression used to describe a variety of different types of computing concepts that involve a large number of computers connected through a real-time communication network such as the Internet cloud computing is a synonym for distributed computing over a network and means the ability to run a program on many connected computers at the same time. The phrase is also more commonly used to refer to network-based services which appear to be provided by real server hardware, which in fact are served up by virtual hardware, simulated by software running on one or more real machines. Such virtual servers do not physically exist and can therefore be moved around and scaled up (or down) on the fly without affecting the end user

Why do I Need Hadoop

Gambar
Too Much Data Hadoop provides storage for Big Data at reasonable cost Storing Big Data using traditional storage can be expensive. Hadoop is built around commodity hardware. Hence it can provide fairly large storage for a reasonable cost. Hadoop has been used in the field at Peta byte scale. Hadoop allows to capture new or more data Some times organizations don't capture a type of data, because it was too cost prohibitive to store it. Since Hadoop provides storage at reasonable cost, this type of data can be captured and stored. One example would be web site click logs. Because the volume of these logs can be very high, not many organizations captured these. Now with Hadoop it is possible to capture and store the logs With Hadoop, you can store data longer To manage the volume of data stored, companies periodically purge older data. For example only logs for the last 3 months could be stored and older logs were deleted. With Hadoop it is possible to store t...

Big Data

Gambar
What is Big Data Big Data is very large, loosely structured data set that defies traditional storage. "Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time”. – wiki So big that a single data set may contains few terabytes to many petabytes of data. Human Generated Data and Machine Generated Data Human Generated Data is emails, documents, photos and tweets. We are generating this data faster than ever. Just imagine the number of videos uploaded to You Tube and tweets swirling around. This data can be Big Data too. Machine Generated Data is a new breed of data. This category consists of sensor data, and logs generated by 'machines' such as email logs, click stream logs, etc. Machine generated data is orders of magnitude larger than Human Generated Data. Before 'Hadoop' was in the scene, th...

HDInsight Installation on Windows Platform

Gambar
First Install Windows 8 then after install the HDInsight. HDInsight installer is powered by Microsoft Web Platform Installer. To download it you can use the following link:   http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW After installing Microsoft WPI (Web Platform installer), run it with administrator privileges. Search Hadoop in the search box. It will locate the HDInsight preview service installer for windows. Click add to add installer to Microsoft WPI installation cart. Accept the license and click install and wait for WPI to do the installation for you. The installer also includes the hadoop package and IIS components. Verifications of HDInsight Installation Success: Four shortcuts will be created on the desktop, when the installer is finished. Double-click on the Hadoop Command Line icon on the desktop. It should open to the C:\Hadoop\hadoop-1.1.0-SNAPSHOT> promp...