Stop the virtual machine:
docker-machine stop name_virtual_system
Start a stopped virtual machine:
docker-machine start name_virtual_system
Delete virtual machine:
docker-machine rm name_virtual_system
Connect to virtual machine:
eval "$ (docker-machine env name_virtual_system)"
Disconnect Docker from VM:
eval $ (docker-machine env -u)
Login via SSH:
docker-machine ssh name_virtual_system
Quit the virtual machine:
exit
Run the sleep 10 command in the virtual machine:
docker-machine ssh name_virtual_system 'sleep 10'
Running commands in BASH environment:
docker-machine ssh dev 'bash -c "sleep 10 && echo 1"'
Copy the dir folder to the virtual machine:
docker-machine scp -r / dir name_virtual_system: / dir
Make a request to the containers of the virtual machine:
curl $ (docker-machine ip name_virtual_system): 9000
Forward port 9005 of host machine to 9005 virtual machine
docker-machine ssh name_virtual_system -f -N -L 9005: 0.0.0.0: 9007
Master initialization:
docker swarm init
Running multiple containers with the same EXPOSE:
essh @ kubernetes-master: ~ / mongo-rs $ docker run –name redis -p 6379 -d redis
f3916da35b6ba5cd393c21d5305002b78c32b089a6cc01e3e2425930c9310cba
essh @ kubernetes-master: ~ / mongo-rs $ docker ps | grep redis
f3916da35b6b redis "docker-entrypoint.s…" 8 seconds ago Up 6 seconds 0.0.0.0:32769->6379/tcp redis
essh @ kubernetes-master: ~ / mongo-rs $ docker port reids
Error: No such container: reids
essh @ kubernetes-master: ~ / mongo-rs $ docker port redis
6379 / tcp -> 0.0.0.0:32769
essh @ kubernetes-master: ~ / mongo-rs $ docker port redis 6379
0.0.0.0:32769
Build is the first solution to copy all files and install. As a result, when any file changes, all packages will be reinstalled:
COPY ./ / src / app
WORKDIR / src / app
RUN NPM install
Let's use caching and split the static files and the installation:
COPY ./package.json /src/app/package.json
WORKDIR / src / app
RUN npm install
COPY. / src / app
Using the base image template node: 7-onbuild:
$ cat Dockerfile
FROM node: 7-onbuild
EXPOSE 3000
$ docker build.
In this case, files that do not need to be included in the image, such as system files, for example, Dockerfile, .git, .node_modules, files with keys, they need to be added to node_modules, files with keys, they need to be added to .dockerignore .
–v / config
docker cp config.conf name_container: / config /
Real-time statistics of used resources:
essh @ kubernetes-master: ~ / mongo-rs $ docker ps -q | docker stats
CONTAINER ID NAME CPU% MEM USAGE / LIMIT MEM% NET I / O BLOCK I / O PIDS
c8222b91737e mongo-rs_slave_1 19.83% 44.12MiB / 15.55GiB 0.28% 54.5kB / 78.8kB 12.7MB / 5.42MB 31
aa12810d16f5 mongo-rs_backup_1 0.81% 44.64MiB / 15.55GiB 0.28% 12.7kB / 0B 24.6kB / 4.83MB 26
7537c906a7ef mongo-rs_master_1 20.09% 47.67MiB / 15.55GiB 0.30% 140kB / 70.7kB 19.2MB / 7.5MB 57
f3916da35b6b redis 0.15% 3.043MiB / 15.55GiB 0.02% 13.2kB / 0B 2.97MB / 0B 4
f97e0697db61 node_api 0.00% 65.52MiB / 15.55GiB 0.41% 862kB / 8.23kB 137MB / 24.6kB 20
8c0d1adc9b9c portainer 0.00% 8.859MiB / 15.55GiB 0.06% 102kB / 3.87MB 57.8MB / 122MB 20
6018b7e3d9cd node_payin 0.00% 9.297MiB / 15.55GiB 0.06% 222kB / 3.04kB 82.4MB / 24.6kB 11
^ C
When creating images, you need to consider:
** changing a large layer, it will be recreated, so it is often better to split it, for example, create one layer with 'NPM i' and copy the code on the second;
* if the file in the image is large and the container is changed, then from the read-only image layer the file will be completely copied to the editing layer, therefore, the containers are supposed to be lightweight, and the content is usually placed in a special storage. code-as-a-service: 12 factors (12factor.net)
* Codebase – one service – they are a repository;
* Dependeces – all dependent services in the config;
* Config – configs are available through the environment;
* BackEnd – exchange data with other services via an API-based network;
* Processes – one service – one process, which allows in the event of a fall to unambiguously track (the container itself ends) and restart it;
* Independence of the environment and no influence on it.
* СI / CD – code control (git) – build (jenkins, GitLab) – relies (Docker, jenkins) – deploy (helm, Kubernetes). Keeping the service lightweight is important, but there are programs not designed to run in containers like databases. Due to their peculiarity, certain requirements are imposed on their launch, and the profit is limited. So, because of big data, they are not only slow to scale, and rolling-abdate is unlikely, and the restart must be performed on the same nodes as their data for reasons of performance of access to them.
* Config – service relationships are defined in the configuration, for example, docker-compose.yml;
* Port bindign – services communicate through ports, while the port can be selected automatically, for example, if EXPOSE PORT is specified in the Dockerfile, then when a container is called with the -P flag, it will be terminated to the free one automatically.
* Env – environment settings are passed through environment variables, not through configs, which allows them to be added to the service config configuration, for example, docker-compose.yml
* Logs – logs are streamed over the network, for example, ELK, or printed to the output, which is already streamed by Docker.
Dockerd internals:
essh @ kubernetes-master: ~ / mongo-rs $ ps aux | grep dockerd
root 6345 1.1 0.7 3257968 123640? Ssl Jul05 76:11 / usr / bin / dockerd -H fd: // –containerd = / run / containerd / containerd.sock
essh 16650 0.0 0.0 21536 1036 pts / 6 S + 23:37 0:00 grep –color = auto dockerd
essh @ kubernetes-master: ~ / mongo-rs $ pgrep dockerd
6345
essh @ kubernetes-master: ~ / mongo-rs $ pstree -c -p -A $ (pgrep dockerd)
dockerd (6345) – + – docker-proxy (720) – + – {docker-proxy} (721)
| | – {docker-proxy} (722)
| | – {docker-proxy} (723)
| | – {docker-proxy} (724)
| | – {docker-proxy} (725)
| | – {docker-proxy} (726)
| | – {docker-proxy} (727)
| `– {docker-proxy} (728)
| -docker-proxy (7794) – + – {docker-proxy} (7808)
Docker-File:
* cleaning caches from package managers: apt-get, pip and others, this cache is not needed in production, only
takes up space and loads the network, but nowadays it is not often not relevant, since there are multi-stage
assembly, but more on that below.
* group commands of the same entities, for example, get APT cache, install programs and uninstall
cache: in one instruction – the code of only programs, with the spaced version – the code of the programs and the cache,
because if you do not delete the cache in one instruction, then it will be saved in the layer, regardless of
follow-up actions.
* separate instructions by frequency of change, so for example, if not split installation
software and code, then when you change something in the code, then instead of using the ready-made
layer with programs, they will be reinstalled, which will entail significant preparation time
image that is critical for developers:
ADD ./app/package.json / app
RUN npm install
ADD ./app / app
Docker alternatives
** Rocket or rkt – containers for the CoreOS operating environment from RedHut, specially designed to use containers.
** Hyper-V is an environment for running Docker on the Windows operating system, which is a wrapper (lightweight virtual machine) of the container.
Docker has branched off its core components, which it uses as primitives, which have become standard components for implementing containers such as RKT, bundled into the containerd project:
* CRI-O – OpanSource project aimed from the beginning to fully support CRI (Container Runtime Interface) standards, github.com/opencontainers/runtime-spec/">Runtime Specification and github.com/opencontainers/image-spec">Image Specification as a general interface for the interaction of the orchestration system with containers. Along with Docker, support for CRI-O 1.0 has been added to Kubernetes (more on this) since version 1.7 in 2007, as well as MiniKube and Kubic. Has a CLI (Common Line Interface) implementation in the Pandom project, which almost completely repeats Docker commands, but without orchestration (Docker Swarm), which is the default tool in Linux Fedora.
* CRI (Kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-Kubernetes/">Container Runtime Interface) – an environment for running containers, universally providing primitives (Executor, Supervisor, Metadata, Content, Snapshot , Events and Metrics) for working with Linux containers (process spaces, groups, etc.).
** CNI (Container Networking Interface) – work with the network.
Portainer
The simplest monitoring option would be Portainer:
essh @ kubernetes-master: ~ / microKubernetes $ cat << EOF> docker-compose.monitoring.yml
version: '2'
>
services:
portainer:
image: portainer / portainer
command: -H unix: ///var/run/Docker.sock
restart: always
ports:
– 9000: 9000
volumes:
– /var/run/Docker.sock:/var/run/Docker.sock
– ./portainer_data:/data
>
EOF
essh @ kubernetes-master: ~ / microKubernetes $ docker-compose -f docker-compose.monitoring.yml up -d
Monitoring with Prometheus
Monitoring – maintaining the continuity of work, tracking the current situation (identifying, localizing and sending about the incident, for example, in SaaS PagerDuty), predicting possible situations, visualization, building models for the normal operation of IAOps (Artificial Intelligence For It Operations, https: //www.gartner .com / en / information-technology / glossary / aiops-artificial-intelligence-operations).
Monitoring contains the following steps:
* identification of the incident;
* notification of the incident;
* localization;
* decision.
Monitoring can be classified by level into the following types:
* infrastructure (operating system, servers, Kubernetes, DBMS),;
* applied (application logs, traces, application events),;
* business processes (points in transactions, traces of transactions).
Monitoring can be classified according to the principle:
* distributed (traces),;
* synthetic (availability),;
* IAOps (forecasting, anomalies).
Monitoring is divided into two parts according to the degree of analysis: logging systems and incident investigation systems. An example of logging
serves as ELK stack, and incident investigation – Sentry (SaaS). For micro-services, a tracing system is also added.
requests such as Jeger or Zipkin. The logging system simply writes all the logs that are available.
The incident investigation system writes much more information, but writes it only in case of errors in the application, for example,
environment parameters, versions of installed packages, stack trace and so on, which allows you to get maximum information when viewing
by mistake, rather than collecting it piece by piece from the server and the GIT repository. But the set and format of information depends on the environment, therefore
the incident system needs to be integrated with various language platforms, and even better with specific frameworks. So Sentry
poisons environment variables, a piece of code and an indication of where the error occurred, parameters of the program and platform
environments, method calls.
Ecosystem monitoring can be divided into:
* Built into Cloud Cloud: Azure Monitoring, Amazon CloudWatch, Google Cloud Monitoring
* Provided as a service with support for various SaaS integrations: DataDog, NewRelic
* CloudNative: Prometheus
* For dedicated servers OnPremis: Zabbix
Zabbix was developed in 1998 and released to OpenSource under the GPL in 2001. At that time, the traditional interface:
without any design, with a lot of tabs, selectors and the like. Since it was developed for
own needs, it contains specific solutions. He is oriented
monitoring devices and their components such as disks, networks, printers, routers and the like. For
interactions can be used:
Agents – installed on servers, collecting many metrics and poisoning Zabbix server
HTTP – Zabbix makes requests over http, for example printers
SNMP – a network protocol for communicating with network devices
IPMI is a protocol for communicating with server devices such as routers
In 2019, Gratner presented a rating of monitoring systems in its square:
** Dynatrace;
** Cisco (AppDynamics);
** New Relic;
** Broadcom (CA Technologies);
** Riverbed and Microsoft;
** IBM;
** Oracle;
** SolarWinds;
** Micro Focus;
** ManageEngine and Tingyun.
Not included in the square:
** Correlsense;
** Datadog;
** Elastic;
** Honeycomb;
** Instant;
** Jennifer Soft;
** Light Step;
** Nastel Technologies;
** SignalFx;
** Splunk;
** Sysdig.
When we run an application in a Docker container, all the standard output (what is displayed in the console) of the running program (process) is buffered. We can view this buffer with the docker logs name_container program . If we follow the Docker ideology – "one process, one container" – we can view the logs of an individual program. It is convenient to use the less and tail commands to view logs. The first program allows you to conveniently scroll through the logs with the keyboard arrows, search for the desired one based on matches and using a regular expression pattern, like the text editor vi . The second program displays the amount we need
An important criterion for ensuring smooth operation is the control of free space. So, if there is no space left, then the database will not be able to write data, with other components the situation can be more dire than the loss of new data. Docker has limit settings not only for individual containers, at least 10%. During imaging or container startup, an error may be thrown that the specified limits have been exceeded. To change the default settings, you need to tell the Dockerd server the settings, after stopping it with service docker stop (all containers will be stopped) and after resuming it with service docker start (the containers will be resumed). Settings can be set as options / bin / dockerd –storange-opt dm.basesize = 50G –stirange-opt
In Container, we have authorization, control over our containers, with the ability to create them for testing and see graphs on the processor and memory. More will require a monitoring system. There are quite a few monitoring systems, for example, Zabbix, Graphite, Prometheus, Nagios, InfluxData, OkMeter, DataDog, Bosum, Sensu and others, of which Zabbix and Prometheus are the most popular. The first is traditionally used, since it is the leading deployment tool, which admins love for its ease of use (all you need to do is to have SSH access to the server), low-level, which allows you to work not only with servers, but also with other hardware, such as routers. The second is the opposite of the first: it is focused exclusively on collecting metrics and monitoring, focused as a ready-made solution, and not a framework and fell in love with programmers, set it according to the principle, chose metrics and received graphs. The key feature between Zabbix and Prometheus is not in the preferences of some to customize in detail for themselves and others to spend much less time, but in the scope. Zabbix is focused on setting up work with a specific hardware, which can be anything, and often very exotic in a corporate environment, and for this entity, a manual collection of metrics is written, a schedule is manually configured. For a dynamically changing environment of cloud solutions, even if it is just a Docker container, and even more so if it is Kubernetes, in which a huge number of entities are constantly created, and the entities themselves, apart from the general environment, are not of particular interest, it is not suitable for this in Prometheus Service Discovery is built-in and navigation is supported for Kubernetes through the namespace, the balancer (service) and the group of containers (POD), which can be configured in Grafana in the form of tables. In Kubernetes, according to The News Stack 2017, Kubernetes User and Experience is used in 63% of cases, in the rest there are more rare cloud monitoring tools.
Metrics can be system (for example, CRU, RAM, ROM) and application (service and application metrics). System metrics are core metrics that are used by Kubernetes for scaling and the like and non-core metrics that are not used by Kubernetes. Here is an example of bundles for collecting metrics:
* cAdvisor + Heapster + InfluxDB
* cAdvisor + collectd + Heapster
* cAdvisor + Prometheus
* snapd + Heapster
* snapd + SNAP cluster-level agent
* Sysdig
There are many monitoring systems and services on the market. We will consider exactly OpenSource, which can be installed in your cluster. They can be divided according to the model of obtaining metrics: into those who collect logs by polling, and those who expect that metrics will be poisoned in them. The latter are simpler both in structure and in use on a small scale. An example would be InfluxDB, which is a database that you can write to. The downside of this solution is the difficulty of scaling both in terms of support and load. If all services write at the same time, then they can overload the monitoring system, especially since it is difficult to scale, since the endpoint is registered in each service. The first group to practice a pull model of interaction is Prometheus. It is also a database with a daemon that polls services based on their registrations in the configuration file and pulls labels in a specific format, for example:
cpu_usage: 2
cpu_usage {app: myapp}: 2
Prometheus is a mature product, it was developed in 2012, and in 2016 it was included in the CNCF (Cloud Native Computing Foundation) consortium. Prometheus consists of:
* TSDB (Time Series Satabase) database, which looks more like a storage queue for metrics, with a specified accumulation period, for example, a week, allowing hundreds of thousands of metrics to be processed per second. This base is local to Prometheus, does not support horizontal scaling, in the case of Prometheus it is achieved by raising several of its instances and sharding them. Prometheus supports data aggregation, which is useful for reducing the amount of accumulated data, as well as archiving the database from memory to disk.
* Service Discovery support Kubernetes in a box through a public API through polling PODs filtered according to the config on port 9121 of the TPC.
* Grafana (a separate product, added by default) – a universal UI with dashboards and charts that supports Prometheus via PromQL.
To return metrics, you can use ready-made solutions or develop your own. For the vast majority of system metrics there is an exporter, and for applied metrics, you often have to give your own metrics. Exporters are general and specialized. For example, NodeExporter provides most of the metrics, including those for processes, but there are two of them, and there are more specialized metrics. If you run Prometheus without exporters, then it will give out almost a thousand metrics, but these are the metrics of Prometheus itself, and there will be no node_ * prefixes in them. For these metrics to appear, you need to enable NodeExporter and write a URL to it in the Prometheus configuration to collect the metrics it provides. For NodeExporter, this can be localhost or the node address and port 9256. Usually, exporters specialize in product-specific metrics, for example:
** node_exporter – node metrics (CRU, Memory, Network);
** snmp_exporter – SNMP protocol metrics;
** mysqld_exporter – MySQL database metrics;
** consul_exporter – Consul database metrics;
** graphite_exporter – Graphite database metrics;
** memcached_exporter – Memcached database metrics;
** haproxy_exporter – HAProxy balancer metrics;
** CAdvisor – container metrics;
** process-exporter – detailed process metrics;
** metrics-server – CRU, Memory, File-descriptors, Disks;
** cAdvisor – a Docker daemon metrics – containers monitoring;
** kube-state-metrics – deployments, PODs, nodes.
Prometheus supports remote data writing (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write), for example, to TSDB distributed storage for Prometheus – Weave Works Cortex, using a setting in the configuration, which allows data analysis from multiple Prometheus:
remote_write:
– url: "http: // localhost: 9000 / receive"
Let's consider his work on a ready-made instance. I'll take www.katacoda.com/courses/istio/deploy-istio-on-kubernetes for this and go through it. Our Prometheus is located on its standard port 9090:
controlplane $ kubectl -n istio-system get svc prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
prometheus ClusterIP 10.99.70.170 <none> 9090 / TCP 6m59s
To open its UI, I'll go to the WEB tab and change the address 80 to 9090: https://2886795314-9090-ollie08.environments.katacoda.com/graph. In the input line, you need to enter the desired metric in the PromQL (Prometheus query language) language, as well as InfluxQL for InfluxDB and SQL for TimescaleDB. For example, I will enter "CRU", and it will display me a list containing it. There are two tabs under the line: a tab with a graph and a tab for displaying in a tabular form. I will be looking at a tabular view. I selected machine_cru_cores and clicked Execute. Common metrics usually have similar names, for example machine_cru_cores and node_cru_cores. The metrics themselves consist of the name, tags in brackets and the value of the metric, in the same form they need to be requested, in the same form they are displayed in the table.
machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "=" controlplane ", kubernetes_io_hostname" = "controlplane"
machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01"
If the network is MEMORY, then you can select machine_memory_bytes – the size of the RAM on the machine (server or virtual):
machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "}
machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node901", kubernetes_io_hostname = "node901"
But in bytes it is not clear, so we will use PromQL to translate to Gb: machine_memory_bytes / 1000/1000/1000
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "controlplane", kubernetes_io25}
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io48}
Let's enter for memory_bytes to search for container_memory_usage_bytes – used memory. The list contains all containers and their current memory consumption, I will give only three:
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepod-pods-beseff633. b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes-cadvisorname," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor " , name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name = "etcd-controlplane"} 45056
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepods-pods-besteff2. 76711789af076c8f2331d8212dad4c044d263c5cc3fa333347921bd6de7950a4.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-caduvisor ", kubernetes_dio_host , name = "k8s_POD_kube-proxy-nhzhn_kube-system_5a815a40-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-nhzhn", pod_name = "kube-proxy-450 nhz
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepa-poda-besteffort. 24ef0e898e1bb7dec9854b67291171aa9c5715d7683f53bdfc2cef49a19744fe.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" node01 ", job =" kubernetes-caduvisor amuber, kubernetes_dio_arch ", kubernetes_dio_arch , name = "k8s_POD_kube-proxy-6v49x_kube-system_6473aeea-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-6v49x", pod_name = "kube-proxy-835549x
Let's set the label that is contained in the metrics to filter out one: container_memory_usage_bytes {container_name = "prometheus"}
container_MEMORY_usage_bytes {beta_Kubernetes_io_arch = "amd64", beta_Kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubePODs.slice / kubePODs-burstableODslice-burdeaf2.slice. b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6.scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", JOB =" Kubernetes-cadvisor ", Kubernetes_io_arch =" amd64 ", Kubernetes_io_hostname =" node01 ", Kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus- 5b77b7d695-knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", POD =" prometheus-5b77b7d695-knf44 ", POD_name =" prometheus-5b77b7d44
283443200
Let's bring in Mb: container_memory_usage_bytes {container_name = "prometheus"} / 1000/1000
{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6 .scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695 -knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}
286.18752
Filter by container_memory_usage_bytes {container_name = "prometheus", instance = "node01"}
beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6. scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695- knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}
289.890304
And on the second one it is not: container_memory_usage_bytes {container_name = "prometheus", instance = "node02"}
no data
There are also aggregate functions sum (container_memory_usage_bytes) / 1000/1000/1000
{} 22.812798976
max (container_memory_usage_bytes) / 1000/1000/1000
{} 3.6422983679999996
min (container_memory_usage_bytes) / 1000/1000/1000
{} 0
You can also group by labels instance: max (container_memory_usage_bytes) by (instance) / 1000/1000/1000
{instance = "controlplane"} 1.641836544
{instance = "node01"} 3.6622745599999997
You can perform actions with the same type of labels and filter out: container_memory_mapped_file / container_memory_usage_bytes * 100> 80
{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-pode45f10af1ae684722cbd74cb11807900.slice / docker-5cb2f2083fbc467b8b394b27b69686d309f951450bcb910d509572aea9922806 .scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" controlplane ", kubernetes_io_os =" linux ", name = "k8s_POD_kube-controller-manager-controlplane_kube-system_e45f10af1ae684722cbd74cb11807900_0", namespace = "kube-system", pod = "kube-controller-manager-controlplane", pod_name = "kube-controller-manager-controlplane"}
80.52631578947368
You can look at the file system metrics using container_fs_limit_bytes, which produces a large list – I will give a few of it:
container_fs_limit_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", device = "/ dev / vda1", id = "/ kubepods.slice / kubepods-besteffort.subods / kubepods-besteffort.slice -besteffort-pod0e619e5dc53ed9efcef63f5fe1d7ee71.slice / docker-b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname = "controlplane", kubernetes_io_os = "linux", name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name "} etcd-controlplane =" etc