top of page
Search
ohyfoq

Tolerance Data 2013 Multilanguage



To use the new version of Amazon CloudSearch you need to recreate existing domains using the new version of Amazon CloudSearch and re-upload your data. For more information, see Migrating to the 2013-01-01 API in the Amazon CloudSearch Developer Guide.


The Amazon CloudSearch 2013-01-01 API offers several new features, including support for multiple languages, highlighting search terms in the results, and getting suggestions. To use these features, you create and configure a new 2013-01-01 search domain, modify your data pipeline to populate the new domain using the 2013-01-01 data format, and update your query pipeline to submit requests in the 2013-01-01 request format. This migration guide summarizes the API changes and highlights the ones that are most likely to affect your application.




tolerance data 2013 multilanguage




Post the document batches to your 2013-01-01 domain's doc endpoint. Note that you must specify the 2013-01-01 API version. For example, the following request posts the batch contained in data1.json to the doc-movies-123456789012.us-east-1.cloudsearch.amazonaws.com endpoint.


The 2013-01-01 API supports prescaling your domain to increase upload capacity. If you have a large amount of data to upload, configure your domain's scaling options and select a larger desired instance type. Moving to a larger instance type enables you to upload batches in parallel and reduces the time it takes for the data to be indexed. For more information, see Configuring Scaling Options in Amazon CloudSearch.


Singh and Reddy [16] presented a survey on Big Data analytics by distinguishing between horizontal scaling and vertical scaling platforms. For the former class P2P networks, Hadoop, and Spark have been discussed, while for the latter one high performance computing clusters, multi-core systems, GPU, and FPGA have been reviewed. They provided a comparison of different platforms based on parameters such as scalability, data I/O performance, fault tolerance, real-time processing, data size supported and iterative tasks support. Finally, the development of a K-Means clustering example is analyzed on the different platforms.


The most used open source framework based on the MapReduce programming model is Apache Hadoop,Footnote 18 a general-purpose framework designed to process very large amounts of data in infrastructures with up to tens of thousands of distributed machines. It enables the development of distributed and parallel applications using many programming languages, relieving developers from having to deal with classical distributed computing issues, such as load balancing, fault tolerance, data locality, and network bandwidth saving.


Over the years, several minor implementations of the MapReduce model have been proposed and implemented, such as Phoenix++ [23] and Sailfish [24], but none of these have ever achieved the same success as Hadoop. In particular, Phoenix++ is a C++ implementation that leverages multi-core chips and shared memory multi-processors. Its runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance. Sailfish is a MapReduce framework for large-scale data processing, which facilitates batch transmission from mappers to reducers to improve performance. An abstraction called I-files is used for supporting data aggregation, adapting the original model to efficiently batch data written and read by multiple nodes.


Hadoop is designed for exploiting data parallelism during map/reduce steps. In fact, input data is partitioned into chunks and processed by different machines in parallel. Data chunks are replicated on different nodes, ensuring high fault tolerance along with checkpoint and recovery. However, the partitioning strategy does not guarantee efficiency when it is needed to access a large amount of small files.


Hadoop Distributed File System (HDFS), a distributed file system providing fault tolerance with automatic recovery, portability across heterogeneous and low-cost commodity hardware and operating systems, high-throughput access and data reliability.


In particular, thanks to the introduction of YARN (Yet Another Resource Negotiator) in 2013, Hadoop turns from a batch processing solution into a reference platform for several other programming systems, such as: Storm for streaming data analysis; Hama for graph analysis; Hive for querying large datasets; HBaseFootnote 30 for random and real-time read/write access to data in a non-relational model; Oozie,Footnote 31 for managing Hadoop jobs; AmbariFootnote 32 for provisioning, managing, and monitoring Hadoop clusters; ZooKeeperFootnote 33 for maintaining configuration information, naming, and providing distributed synchronization and group services; and more.


MPI provides a low-level of abstraction for developing efficient and portable iterative parallel applications, even if performance may be limited by the communication latency between processors. MPI programmers cannot exploit any high-level construct and must manually cope with complex distributed programming issues, such as data exchange, distribution of data across processors, synchronization, and deadlock. Those issues make it hard to debug an application and does not make the programming task easy on end-user parallel applications where higher level languages are required to simplify the developer task. However, thanks to its portability and efficiency coming from a low-level programming model, MPI is largely used and has been implemented on a very large set of parallel and sequential architectures, such as MPP systems, workstation networks, clusters, and Grids. It is worth mentioning that MPI-1 did not make any provision for process creation, which were introduced later in the MPI-2 [66] version. The current version, MPI-4, provides extensions to better support hybrid programming models and fault tolerance. 2ff7e9595c


0 views0 comments

Recent Posts

See All

Comments


bottom of page