CCA175 Certification Practice Guide, CCA175 Online Training Materials

By blog Admin | Posted Thu, 26 Jul 2018 16:38:31 GMT
Valid CCA175 Dumps shared by newpassleader.com for Helping Passing CCA175 Exam! newpassleader.com now offer the newest CCA175 exam dumps, the newpassleader.com CCA175 exam questions have been updated and answers have been corrected get the newest newpassleader.com CCA175 dumps with Test Engine here:
https://www.newpassleader.com/Cloudera/CCA175-exam-preparation-materials.html
(96 Q&As Dumps, 30%OFF Special Discount: 30free)


NEW QUESTION NO: 10
CORRECT TEXT
Problem Scenario 33 : You have given a files as below.
spark5/EmployeeName.csv (id,name)
spark5/EmployeeSalary.csv (id,salary)
Data is given below:
EmployeeName.csv
E01,Lokesh
E02,Bhupesh
E03,Amit
E04,Ratan
E05,Dinesh
E06,Pavan
E07,Tejas
E08,Sheela
E09,Kumar
E10,Venkat
EmployeeSalary.csv
E01,50000
E02,50000
E03,45000
E04,45000
E05,50000
E06,45000
E07,50000
E08,10000
E09,10000
E10,10000
Now write a Spark code in scala which will load these two tiles from hdfs and join the same, and produce the (name.salary) values.
And save the data in multiple tile group by salary (Means each file will have name of employees with same salary). Make sure file name include salary as well.
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create all three files in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.
Step 2 : Load EmployeeName.csv file from hdfs and create PairRDDs
val name = sc.textFile("spark5/EmployeeName.csv")
val namePairRDD = name.map(x=> (x.split(",")(0),x.split('V')(1)))
Step 3 : Load EmployeeSalary.csv file from hdfs and create PairRDDs
val salary = sc.textFile("spark5/EmployeeSalary.csv")
val salaryPairRDD = salary.map(x=> (x.split(",")(0),x.split(",")(1)))
Step 4 : Join all pairRDDS
val joined = namePairRDD.join(salaryPairRDD}
Step 5 : Remove key from RDD and Salary as a Key. val keyRemoved = joined.values
Step 6 : Now swap filtered RDD.
val swapped = keyRemoved.map(item => item.swap)
Step 7 : Now groupBy keys (It will generate key and value array) val grpByKey = swapped.groupByKey().collect()
Step 8 : Now create RDD for values collection
val rddByKey = grpByKey.map{case (k,v) => k->sc.makeRDD(v.toSeq)}
Step 9 : Save the output as a Text file.
rddByKey.foreach{ case (k,rdd) => rdd.saveAsTextFile("spark5/Employee"+k)}
NEW QUESTION NO: 11
CORRECT TEXT
Problem Scenario 34 : You have given a file named spark6/user.csv.
Data is given below:
user.csv
id,topic,hits
Rahul,scala,120
Nikita,spark,80
Mithun,spark,1
myself,cca175,180
Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row.
Map(id -> om, topic -> scala, hits -> 120)
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.
Step 2 : Load user.csv file from hdfs and create PairRDDs val csv =
sc.textFile("spark6/user.csv")
Step 3 : split and clean data
val headerAndRows = csv.map(line => line.split(",").map(_.trim))
Step 4 : Get header row
val header = headerAndRows.first
Step 5 : Filter out header (We need to check if the first val matches the first header name) val data = headerAndRows.filter(_(0) != header(O))
Step 6 : Splits to map (header/value pairs)
val maps = data.map(splits => header.zip(splits).toMap)
step 7: Filter out the user "myself
val result = maps.filter(map => mapf'id") != "myself")
Step 8 : Save the output as a Text file. result.saveAsTextFile("spark6/result.txt")
NEW QUESTION NO: 12
CORRECT TEXT
Problem Scenario 2 :
There is a parent organization called "ABC Group Inc", which has two child companies named Tech Inc and MPTech.
Both companies employee information is given in two separate text file as below. Please do the following activity for employee details.
Tech Inc.txt
1,Alok,Hyderabad
2,Krish,Hongkong
3,Jyoti,Mumbai
4 ,Atul,Banglore
5 ,Ishan,Gurgaon
MPTech.txt
6 ,John,Newyork
7 ,alp2004,California
8 ,tellme,Mumbai
9 ,Gagan21,Pune
1 0,Mukesh,Chennai
1 . Which command will you use to check all the available command line options on HDFS and How will you get the Help for individual command.
2. Create a new Empty Directory named Employee using Command line. And also create an empty file named in it Techinc.txt
3. Load both companies Employee data in Employee directory (How to override existing file in HDFS).
4. Merge both the Employees data in a Single tile called MergedEmployee.txt, merged tiles should have new line character at the end of each file content.
5. Upload merged file on HDFS and change the file permission on HDFS merged file, so that owner and group member can read and write, other user can read the file.
6. Write a command to export the individual file as well as entire directory from HDFS to local file System.
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Check All Available command hdfs dfs
Step 2 : Get help on Individual command hdfs dfs -help get
Step 3 : Create a directory in HDFS using named Employee and create a Dummy file in it called e.g. Techinc.txt hdfs dfs -mkdir Employee
Now create an emplty file in Employee directory using Hue.
Step 4 : Create a directory on Local file System and then Create two files, with the given data in problems.
Step 5 : Now we have an existing directory with content in it, now using HDFS command line , overrid this existing Employee directory. While copying these files from local file
System to HDFS. cd /home/cloudera/Desktop/ hdfs dfs -put -f Employee
Step 6 : Check All files in directory copied successfully hdfs dfs -Is Employee
Step 7 : Now merge all the files in Employee directory, hdfs dfs -getmerge -nl Employee
MergedEmployee.txt
Step 8 : Check the content of the file. cat MergedEmployee.txt
Step 9 : Copy merged file in Employeed directory from local file ssytem to HDFS. hdfs dfs - put MergedEmployee.txt Employee/
Step 10 : Check file copied or not. hdfs dfs -Is Employee
Step 11 : Change the permission of the merged file on HDFS hdfs dfs -chmpd 664
Employee/MergedEmployee.txt
Step 12 : Get the file from HDFS to local file system, hdfs dfs -get Employee
Employee_hdfs
NEW QUESTION NO: 13
CORRECT TEXT
Problem Scenario 51 : You have been given below code snippet.
val a = sc.parallelize(List(1, 2,1, 3), 1)
val b = a.map((_, "b"))
val c = a.map((_, "c"))
Operation_xyz
Write a correct code snippet for Operationxyz which will produce below output.
Output:
Array[(lnt, (lterable[String], lterable[String]))] = Array(
(2,(ArrayBuffer(b),ArrayBuffer(c))),
(3,(ArrayBuffer(b),ArrayBuffer(c))),
(1,(ArrayBuffer(b, b),ArrayBuffer(c, c)))
)
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
b.cogroup(c).collect
cogroup [Pair], groupWith [Pair]
A very powerful set of functions that allow grouping up to 3 key-value RDDs together using their keys.
Another example
val x = sc.parallelize(List((1, "apple"), (2, "banana"), (3, "orange"), (4, "kiwi")), 2) val y = sc.parallelize(List((5, "computer"), (1, "laptop"), (1, "desktop"), (4, "iPad")), 2) x.cogroup(y).collect
Array[(lnt, (lterable[String], lterable[String]))] = Array(
(4,(ArrayBuffer(kiwi),ArrayBuffer(iPad))),
(2,(ArrayBuffer(banana),ArrayBuffer())),
(3,(ArrayBuffer(orange),ArrayBuffer())),
(1 ,(ArrayBuffer(apple),ArrayBuffer(laptop, desktop))),
(5,{ArrayBuffer(),ArrayBuffer(computer))))
NEW QUESTION NO: 14
CORRECT TEXT
Problem Scenario 73 : You have been given data in json format as below.
{"first_name":"Ankit", "last_name":"Jain"}
{"first_name":"Amir", "last_name":"Khan"}
{"first_name":"Rajesh", "last_name":"Khanna"}
{"first_name":"Priynka", "last_name":"Chopra"}
{"first_name":"Kareena", "last_name":"Kapoor"}
{"first_name":"Lokesh", "last_name":"Yadav"}
Do the following activity
1 . create employee.json file locally.
2 . Load this file on hdfs
3 . Register this data as a temp table in Spark using Python.
4 . Write select query and print this data.
5 . Now save back this selected data in json format.
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
Step 3 : Write spark script
#lmport SQLContext
from pyspark import SQLContext
# Create instance of SQLContext sqIContext = SQLContext(sc)
# Load json file
employee = sqlContext.jsonFile("employee.json")
# Register RDD as a temp table employee.registerTempTablef'EmployeeTab"}
# Select data from Employee table
employeelnfo = sqlContext.sql("select * from EmployeeTab"}
#lterate data and print
for row in employeelnfo.collect():
print(row)
Step 4 : Write dataas a Text file employeelnfo.toJSON().saveAsTextFile("employeeJson1")
Step 5: Check whether data has been created or not hadoop fs -cat employeeJsonl/part"
NEW QUESTION NO: 15
CORRECT TEXT
Problem Scenario 15 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following activities.
1. In mysql departments table please insert following record. Insert into departments values(9999, '"Data Science"1);
2. Now there is a downstream system which will process dumps of this file. However, system is designed the way that it can process only files if fields are enlcosed in(') single quote and separate of the field should be (-} and line needs to be terminated by : (colon).
3. If data itself contains the " (double quote } than it should be escaped by \.
4. Please import the departments table in a directory called departments_enclosedby and file should be able to process by downstream system.
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Connect to mysql database.
mysql --user=retail_dba -password=cloudera
show databases; use retail_db; show tables;
Insert record
Insert into departments values(9999, '"Data Science"');
select" from departments;
Step 2 : Import data as per requirement.
sqoop import \
-connect jdbc:mysql;//quickstart:3306/retail_db \
~ username=retail_dba \
--password=cloudera \
-table departments \
-target-dir /user/cloudera/departments_enclosedby \
-enclosed-by V -escaped-by \\ -fields-terminated-by--' -lines-terminated-by :
Step 3 : Check the result.
hdfs dfs -cat/user/cloudera/departments_enclosedby/part"
NEW QUESTION NO: 16
CORRECT TEXT
Problem Scenario 45 : You have been given 2 files , with the content as given Below
(spark12/technology.txt)
(spark12/salary.txt)
(spark12/technology.txt)
first,last,technology
Amit,Jain,java
Lokesh,kumar,unix
Mithun,kale,spark
Rajni,vekat,hadoop
Rahul,Yadav,scala
(spark12/salary.txt)
first,last,salary
Amit,Jain,100000
Lokesh,kumar,95000
Mithun,kale,150000
Rajni,vekat,154000
Rahul,Yadav,120000
Write a Spark program, which will join the data based on first and last name and save the joined results in following format, first Last.technology.salary
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create 2 files first using Hue in hdfs.
Step 2 : Load all file as an RDD
val technology = sc.textFile(Msparkl2/technology.txt").map(e => e.splitf',")) val salary = sc.textFile("spark12/salary.txt").map(e => e.split("."))
Step 3 : Now create Key.value pair of data and join them.
val joined = technology.map(e=>((e(0),e(1)),e(2))).join(salary.map(e=>((e(0),e(1)),e(2))))
Step 4 : Save the results in a text file as below.
joined.repartition(1).saveAsTextFile("spark12/multiColumn Joined.txt")
NEW QUESTION NO: 17
CORRECT TEXT
Problem Scenario 11 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following.
1. Import departments table in a directory called departments.
2. Once import is done, please insert following 5 records in departments mysql table.
Insert into departments(10, physics);
Insert into departments(11, Chemistry);
Insert into departments(12, Maths);
Insert into departments(13, Science);
Insert into departments(14, Engineering);
3. Now import only new inserted records and append to existring directory . which has been created in first step.
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Clean already imported data. (In real exam, please make sure you dont delete data generated from previous exercise).
hadoop fs -rm -R departments
Step 2 : Import data in departments directory.
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username=retail_dba \
-password=cloudera \
-table departments \
"target-dir/user/cloudera/departments
Step 3 : Insert the five records in departments table.
mysql -user=retail_dba --password=cloudera retail_db
Insert into departments values(10, "physics"); Insert into departments values(11,
"Chemistry"); Insert into departments values(12, "Maths"); Insert into departments values(13, "Science"); Insert into departments values(14, "Engineering"); commit; select' from departments;
Step 4 : Get the maximum value of departments from last import, hdfs dfs -cat
/user/cloudera/departments/part* that should be 7
Step 5 : Do the incremental import based on last import and append the results.
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:330G/retail_db" \
~ username=retail_dba \
-password=cloudera \
-table departments \
--target-dir /user/cloudera/departments \
-append \
-check-column "department_id" \
-incremental append \
-last-value 7
Step 6 : Now check the result.
hdfs dfs -cat /user/cloudera/departments/part"
NEW QUESTION NO: 18
CORRECT TEXT
Problem Scenario 23 : You have been given log generating service as below.
Start_logs (It will generate continuous logs)
Tail_logs (You can check , what logs are being generated)
Stop_logs (It will stop the log service)
Path where logs are generated using above service : /opt/gen_logs/logs/access.log
Now write a flume configuration file named flume3.conf , using that configuration file dumps logs in HDFS file system in a directory called flumeflume3/%Y/%m/%d/%H/%M
Means every minute new directory should be created). Please us the interceptors to provide timestamp information, if message header does not have header info.
And also note that you have to preserve existing timestamp, if message contains it. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events.
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create flume configuration file, with below configuration for source, sink and channel.
#Define source , sink , channel and agent,
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1 .sources.source1.type = exec
agentl.sources.source1.command = tail -F /opt/gen logs/logs/access.log
#Define interceptors
agent1 .sources.source1.interceptors=i1
agent1 .sources.source1.interceptors.i1.type=timestamp
agent1 .sources.source1.interceptors.i1.preserveExisting=true
## Describe sink1
agent1 .sinks.sink1.channel = memory-channel
agent1 .sinks.sink1.type = hdfs
agent1 .sinks.sink1.hdfs.path = flume3/%Y/%m/%d/%H/%M
agent1 .sinks.sjnkl.hdfs.fileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
Agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
Step 2 : Run below command which will use this configuration file and append data in hdfs.
Start log service using : start_logs
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file
/home/cloudera/flumeconf/flume3.conf -DfIume.root.logger=DEBUG,INFO,console -name agent1
Wait for few mins and than stop log service.
stop logs
NEW QUESTION NO: 19
CORRECT TEXT
Problem Scenario 30 : You have been given three csv files in hdfs as below.
EmployeeName.csv with the field (id, name)
EmployeeManager.csv (id, manager Name)
EmployeeSalary.csv (id, Salary)
Using Spark and its API you have to generate a joined output as below and save as a text tile (Separated by comma) for final distribution and output must be sorted by id.
ld,name,salary,managerName
EmployeeManager.csv
E01,Vishnu
E02,Satyam
E03,Shiv
E04,Sundar
E05,John
E06,Pallavi
E07,Tanvir
E08,Shekhar
E09,Vinod
E10,Jitendra
EmployeeName.csv
E01,Lokesh
E02,Bhupesh
E03,Amit
E04,Ratan
E05,Dinesh
E06,Pavan
E07,Tejas
E08,Sheela
E09,Kumar
E10,Venkat
EmployeeSalary.csv
E01,50000
E02,50000
E03,45000
E04,45000
E05,50000
E06,45000
E07,50000
E08,10000
E09,10000
E10,10000
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create all three files in hdfs in directory called sparkl (We will do using Hue}.
However, you can first create in local filesystem and then
Step 2 : Load EmployeeManager.csv file from hdfs and create PairRDDs
val manager = sc.textFile("spark1/EmployeeManager.csv")
val managerPairRDD = manager.map(x=> (x.split(",")(0),x.split(",")(1)))
Step 3 : Load EmployeeName.csv file from hdfs and create PairRDDs
val name = sc.textFile("spark1/EmployeeName.csv")
val namePairRDD = name.map(x=> (x.split(",")(0),x.split('\")(1)))
Step 4 : Load EmployeeSalary.csv file from hdfs and create PairRDDs
val salary = sc.textFile("spark1/EmployeeSalary.csv")
val salaryPairRDD = salary.map(x=> (x.split(",")(0),x.split(",")(1)))
Step 4 : Join all pairRDDS
val joined = namePairRDD.join(salaryPairRDD}.join(managerPairRDD}
Step 5 : Now sort the joined results, val joinedData = joined.sortByKey()
Step 6 : Now generate comma separated data.
val finalData = joinedData.map(v=> (v._1, v._2._1._1, v._2._1._2, v._2._2))
Step 7 : Save this output in hdfs as text file.
finalData.saveAsTextFile("spark1/result.txt")
NEW QUESTION NO: 20
CORRECT TEXT
Problem Scenario 7 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following.
1. Import department tables using your custom boundary query, which import departments between 1 to 25.
2 . Also make sure each tables file is partitioned in 2 files e.g. part-00000, part-00002
3 . Also make sure you have imported only two columns from table, which are department_id,department_name
Answer: 
See the explanation for Step by Step Solution and configuration.
Explanation:
Solutions :
Step 1 : Clean the hdfs tile system, if they exists clean out.
hadoop fs -rm -R departments
hadoop fs -rm -R categories
hadoop fs -rm -R products
hadoop fs -rm -R orders
hadoop fs -rm -R order_itmes
hadoop fs -rm -R customers
Step 2 : Now import the department table as per requirement.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
--username=retail_dba \
-password=cloudera \
-table departments \
-target-dir /user/cloudera/departments \
-m2\
-boundary-query "select 1, 25 from departments" \
-columns department_id,department_name
Step 3 : Check imported data.
hdfs dfs -Is departments
hdfs dfs -cat departments/part-m-00000
hdfs dfs -cat departments/part-m-00001

Posted 2018/7/26 16:38:31  |  Category: Cloudera  |  Tag: CCA175 certification guideCCA175 training materialsCCA175 online dumps
Copyright © 2026. GetCertKey All rights reserved.