本地搭建OpenShift集群指南

Local Openshift Cluster Installation Guide

Preparation

Hosts

  • 1 control host, 1 master and 3 nodes
  • centos 7

Install packages on control host

  • Run yum install -y python2-passlib httpd-tools

Install and config dns on control host

  • Run yum install etcd
  • Download or build skydns and run ./skydns -machines=http://127.0.0.1:2379 -addr=0.0.0.0:53
  • Create dns items
1
2
3
4
curl -XPUT http://127.0.0.1:2379/v2/keys/skydns/tw/oc/master -d value='{"host":"10.202.128.192"}'
curl -XPUT http://127.0.0.1:2379/v2/keys/skydns/tw/oc/node1 -d value='{"host":"10.202.128.73"}'
curl -XPUT http://127.0.0.1:2379/v2/keys/skydns/tw/oc/node2 -d value='{"host":"10.202.128.93"}'
curl -XPUT http://127.0.0.1:2379/v2/keys/skydns/tw/oc/node3 -d value='{"host":"10.202.128.76"}'
  • Configure node dns file /etc/resolv.conf with contents:
1
2
3
search oc.tw default.svc cluster.local
nameserver 127.0.0.1
nameserver 10.202.129.100
  • On every node, run hostname node*.oc.tw to correct the hostname (make it the same as it’s dns name)

Installation

  • clone openshift repo: git clone https://github.com/openshift/openshift-ansible.git
  • Prepare hosts file with contents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# Create OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd
nfs

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root
openshift_deployment_type=origin
openshift_release=3.6.0
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability,package_availability,package_version
openshift_metrics_install_metrics=true
openshift_master_default_subdomain=apps.oc.tw
openshift_logging_install_logging=true
openshift_master_cluster_public_hostname=master.oc.tw
openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_nfs_directory=/exports
openshift_hosted_registry_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_volume_size=10Gi
openshift_logging_storage_kind=nfs
openshift_logging_storage_access_modes=['ReadWriteOnce']
openshift_logging_storage_nfs_directory=/exports
openshift_logging_storage_nfs_options='*(rw,root_squash)'
openshift_logging_storage_volume_name=logging
openshift_logging_storage_volume_size=10Gi
openshift_metrics_storage_kind=nfs
openshift_metrics_storage_access_modes=['ReadWriteOnce']
openshift_metrics_storage_nfs_directory=/exports
openshift_metrics_storage_nfs_options='*(rw,root_squash)'
openshift_metrics_storage_volume_name=metrics
openshift_metrics_storage_volume_size=10Gi

openshift_master_identity_providers=[{'name':'htpasswd_auth','login':'true','challenge':'true','kind':'HTPasswdPasswordIdentityProvider','filename':'/etc/origin/master/htpasswd'}]

# host group for masters
[masters]
master.oc.tw openshift_public_hostname=master.oc.tw

# host group for nodes, includes region info
[nodes]
master.oc.tw openshift_public_hostname=master.oc.tw
node1.oc.tw openshift_node_labels="{'region':'infra'}" openshift_public_hostname=node1.oc.tw
node2.oc.tw openshift_node_labels="{'region':'primary'}" openshift_public_hostname=node2.oc.tw
node3.oc.tw openshift_node_labels="{'region':'primary'}" openshift_public_hostname=node3.oc.tw

[etcd]
master.oc.tw openshift_public_hostname=master.oc.tw

[nfs]
master.oc.tw
  • run ansible-playbook -i ./hosts ~/openshift-ansible/playbooks/byo/config.yml to install
  • run ansible-playbook -i ./hosts ~/openshift-ansible/playbooks/adhoc/uninstall.yml to uninstall. (ensure configurations are all removed, by checking /etc/origin folder on every node)

NetworkManager must be installed

  • install NetworkManager on each node: run yum install -y NetworkManager && systemctl enable NetworkManager && systemctl start NetworkManager and hostname master.oc.tw to correct the hostname
  • pay attention to master/node restarting, since it will reconfigure the hostname, quickly run hostname master.oc.tw when you found out that (maybe configure NetworkManager to assign a static hostname will work as well)

Cannot find file /etc/origin/node/resolv.conf

  • Create resolv.conf on each node: cp /etc/resolv.conf /etc/origin/node/resolv.conf

Cannot resolve external dns (eg. github.com) from container

  • edit file /etc/dnsmasq.d/origin-dns.conf to add server=10.202.129.100
  • restart dnsmasq: systemctl restart dnsmasq.service

sti builder cannot push image to docker-registry.default.svc:5000 (no route to host)

  • edit file /etc/resolve.conf on node with contents:
1
2
3
search oc.tw default.svc cluster.local
nameserver 127.0.0.1
nameserver 10.202.129.100

oc get nodes on master raise certificate issue

  • run cp -v /etc/origin/master/admin.kubeconfig .kube/config to copy the config to your local

Common ways to debug

  • systemctl status *.service
  • journalctl -xe
  • journalctl --unit dnsmasq

nfs pv configuration

Errors from oc describe pv pv0001

Recycle failed: unexpected error creating recycler pod: pods “recycler-for-pv0001” is forbidden: service account openshift-infra/pv-recycler-controller was not found, retry after the service account is created

  • Create sa: oc create sa pv-recycler-controller -n openshift-infra
  • Config scc:
    oadm policy add-scc-to-user hostmount-anyuid system:serviceaccount:openshift-infra:pv-recycler-controller

Create nfs pv

  • Edit /etc/exports.d/openshift-ansible.exports, add lines like /exports/pvs/pv0001 *(rw,root_squash)
  • Run: mv /exports/pvs/pv0001 && chown nfsnobody:nfsnobody /exports/pv0001 && chmod 777 /exports/pv0001
  • Ensure mount in nodes work well by run mount -t nfs master.oc.tw:/exports/pvs/pv0001 ./t
  • Create pv.yml with below and run oc create -f pv.yml to create pv:
1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0005
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
nfs:
path: /exports/pvs/pv0005
server: master.oc.tw
persistentVolumeReclaimPolicy: Recycle
  • Debug:
    • ps aux | grep mount on node to see if it succeeded
    • oc describe po/xxx on master to see the status of the container

Local service dns resolution

Logging issue

EFK stack use kubernetes DaemonSet to schecule logging-fluentd-xx pod on every node to collect container logs.

  • List DaemonSet oc get ds and make sure the pods count is the same as node count
  • Verify the DaemonSet settings oc edit ds/logging-fluentd. Check nodeSelector property.
  • Rsh into one of the fluentd pods, and check run.sh to find what is been done.
  • Add environment variables to help debug, edit the DaemonSet oc edit ds/logging-fluentd and add environment variables ENABLE_MONITOR_AGENT=true;ENABLE_MONITOR_AGENT=true (Refer here)

Common management tasks

  • Login as admin: ssh into master and you’ll be admin directly when you run oc commands
  • Add user through htpasswd: run htpasswd -b /etc/origin/master/htpasswd admin admin on master
  • Add project to user: oadm policy add-role-to-user admin gmliao -n logging

Node Management

Add node

Refer here: https://docs.openshift.org/latest/install_config/adding_hosts_to_existing_cluster.html#adding-nodes-advanced

  • Add new_nodes or new_masters to [OSEv3:vars] section
  • Add a new section:
1
2
3
[new_nodes]
node1.oc.tw openshift_node_labels="{'region':'infra'}" openshift_public_hostname=node1.oc.tw
node2.oc.tw openshift_node_labels="{'region':'primary'}" openshift_public_hostname=node2.oc.tw
  • Run installation: ansible-playbook -i hosts ~/openshift-ansible/playbooks/byo/openshift-node/scaleup.yml
  • Move nodes configured in [new_nodes] section to [nodes] section

Remove node

Refer here: https://docs.openshift.org/latest/admin_guide/manage_nodes.html#deleting-nodes

Reference